dart-sdk/runtime/vm/object_graph_copy.cc

1958 lines
76 KiB
C++
Raw Normal View History

[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
// Copyright (c) 2021, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.
#include "vm/object_graph_copy.h"
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
#include "vm/dart_api_state.h"
#include "vm/flags.h"
#include "vm/heap/weak_table.h"
#include "vm/longjump.h"
#include "vm/object.h"
#include "vm/object_store.h"
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
#include "vm/snapshot.h"
#include "vm/symbols.h"
#define Z zone_
// The list here contains two kinds of classes of objects
// * objects that will be shared and we will therefore never need to copy
// * objects that user object graphs should never reference
#define FOR_UNSUPPORTED_CLASSES(V) \
V(AbstractType) \
V(ApiError) \
V(Bool) \
V(CallSiteData) \
V(Capability) \
V(Class) \
V(ClosureData) \
V(Code) \
V(CodeSourceMap) \
V(CompressedStackMaps) \
V(ContextScope) \
V(DynamicLibrary) \
V(Error) \
V(ExceptionHandlers) \
V(FfiTrampolineData) \
V(Field) \
V(Function) \
V(FunctionType) \
V(FutureOr) \
V(ICData) \
V(Instance) \
V(Instructions) \
V(InstructionsSection) \
V(InstructionsTable) \
V(Int32x4) \
V(Integer) \
V(KernelProgramInfo) \
V(LanguageError) \
V(Library) \
V(LibraryPrefix) \
V(LoadingUnit) \
V(LocalVarDescriptors) \
V(MegamorphicCache) \
V(Mint) \
V(MirrorReference) \
V(MonomorphicSmiableCall) \
V(Namespace) \
V(Number) \
V(ObjectPool) \
V(PatchClass) \
V(PcDescriptors) \
V(Pointer) \
V(ReceivePort) \
V(RegExp) \
V(Script) \
V(Sentinel) \
V(SendPort) \
V(SingleTargetCache) \
V(Smi) \
V(StackTrace) \
V(SubtypeTestCache) \
V(Type) \
V(TypeArguments) \
V(TypeParameter) \
V(TypeParameters) \
V(TypeRef) \
V(TypedDataBase) \
V(UnhandledException) \
V(UnlinkedCall) \
V(UnwindError) \
V(UserTag) \
V(WeakSerializationReference)
namespace dart {
DEFINE_FLAG(bool,
enable_fast_object_copy,
true,
"Enable fast path for fast object copy.");
DEFINE_FLAG(bool,
gc_on_foc_slow_path,
false,
"Cause a GC when falling off the fast path for fast object copy.");
const char* kFastAllocationFailed = "fast allocation failed";
struct PtrTypes {
using Object = ObjectPtr;
static const dart::UntaggedObject* UntagObject(Object arg) {
return arg.untag();
}
static const dart::ObjectPtr GetObjectPtr(Object arg) { return arg; }
static const dart::Object& HandlifyObject(ObjectPtr arg) {
return dart::Object::Handle(arg);
}
#define DO(V) \
using V = V##Ptr; \
static Untagged##V* Untag##V(V##Ptr arg) { return arg.untag(); } \
static V##Ptr Get##V##Ptr(V##Ptr arg) { return arg; } \
static V##Ptr Cast##V(ObjectPtr arg) { return dart::V::RawCast(arg); }
CLASS_LIST_FOR_HANDLES(DO)
#undef DO
};
struct HandleTypes {
using Object = const dart::Object&;
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
static const dart::UntaggedObject* UntagObject(Object arg) {
return arg.ptr().untag();
}
static dart::ObjectPtr GetObjectPtr(Object arg) { return arg.ptr(); }
static Object HandlifyObject(Object arg) { return arg; }
#define DO(V) \
using V = const dart::V&; \
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
static Untagged##V* Untag##V(V arg) { return arg.ptr().untag(); } \
static V##Ptr Get##V##Ptr(V arg) { return arg.ptr(); } \
static V Cast##V(const dart::Object& arg) { return dart::V::Cast(arg); }
CLASS_LIST_FOR_HANDLES(DO)
#undef DO
};
DART_FORCE_INLINE
static ObjectPtr Marker() {
return Object::unknown_constant().ptr();
}
// Keep in sync with runtime/lib/isolate.cc:ValidateMessageObject
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
DART_FORCE_INLINE
static bool CanShareObject(ObjectPtr obj, uword tags) {
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
if ((tags & UntaggedObject::CanonicalBit::mask_in_place()) != 0) {
return true;
}
const auto cid = UntaggedObject::ClassIdTag::decode(tags);
if (cid == kOneByteStringCid) return true;
if (cid == kTwoByteStringCid) return true;
if (cid == kExternalOneByteStringCid) return true;
if (cid == kExternalTwoByteStringCid) return true;
if (cid == kMintCid) return true;
if (cid == kImmutableArrayCid) return true;
if (cid == kNeverCid) return true;
if (cid == kSentinelCid) return true;
if (cid == kStackTraceCid) return true;
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
#if defined(DART_PRECOMPILED_RUNTIME)
// In JIT mode we have field guards enabled which means
// double/float32x4/float64x2 boxes can be mutable and we therefore cannot
// share them.
if (cid == kDoubleCid || cid == kFloat32x4Cid || cid == kFloat64x2Cid) {
return true;
}
#endif
if (cid == kInt32x4Cid) return true; // No field guards here.
if (cid == kSendPortCid) return true;
if (cid == kCapabilityCid) return true;
if (cid == kRegExpCid) return true;
if (cid == kClosureCid) {
// We can share a closure iff it doesn't close over any state.
return Closure::RawCast(obj)->untag()->context() == Object::null();
}
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
return false;
}
// Whether executing `get:hashCode` (possibly in a different isolate) on an
// object with the given [tags] might return a different answer than the source
// object (if copying is needed) or on the same object (if the object is
// shared).
DART_FORCE_INLINE
static bool MightNeedReHashing(ObjectPtr object) {
const uword tags = TagsFromUntaggedObject(object.untag());
const auto cid = UntaggedObject::ClassIdTag::decode(tags);
// These use structural hash codes and will therefore always result in the
// same hash codes.
if (cid == kOneByteStringCid) return false;
if (cid == kTwoByteStringCid) return false;
if (cid == kExternalOneByteStringCid) return false;
if (cid == kExternalTwoByteStringCid) return false;
if (cid == kMintCid) return false;
if (cid == kDoubleCid) return false;
if (cid == kBoolCid) return false;
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
if (cid == kSendPortCid) return false;
if (cid == kCapabilityCid) return false;
if (cid == kNullCid) return false;
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
// These are shared and use identity hash codes. If they are used as a key in
// a map or a value in a set, they will already have the identity hash code
// set.
if (cid == kImmutableArrayCid) return false;
if (cid == kRegExpCid) return false;
if (cid == kInt32x4Cid) return false;
// We copy those (instead of sharing them) - see [CanShareObjct]. They rely
// on the default hashCode implementation which uses identity hash codes
// (instead of structural hash code).
if (cid == kFloat32x4Cid || cid == kFloat64x2Cid) {
return !kDartPrecompiledRuntime;
}
// If the [tags] indicates this is a canonical object we'll share it instead
// of copying it. That would suggest we don't have to re-hash maps/sets
// containing this object on the receiver side.
//
// Though the object can be a constant of a user-defined class with a
// custom hash code that is misbehaving (e.g one that depends on global field
// state, ...). To be on the safe side we'll force re-hashing if such objects
// are encountered in maps/sets.
//
// => We might want to consider changing the implementation to avoid rehashing
// in such cases in the future and disambiguate the documentation.
return true;
}
DART_FORCE_INLINE
uword TagsFromUntaggedObject(UntaggedObject* obj) {
return obj->tags_;
}
DART_FORCE_INLINE
void SetNewSpaceTaggingWord(ObjectPtr to, classid_t cid, uint32_t size) {
uword tags = 0;
tags = UntaggedObject::SizeTag::update(size, tags);
tags = UntaggedObject::ClassIdTag::update(cid, tags);
tags = UntaggedObject::OldBit::update(false, tags);
tags = UntaggedObject::OldAndNotMarkedBit::update(false, tags);
tags = UntaggedObject::OldAndNotRememberedBit::update(false, tags);
tags = UntaggedObject::CanonicalBit::update(false, tags);
tags = UntaggedObject::NewBit::update(true, tags);
#if defined(HASH_IN_OBJECT_HEADER)
tags = UntaggedObject::HashTag::update(0, tags);
#endif
to.untag()->tags_ = tags;
}
DART_FORCE_INLINE
ObjectPtr AllocateObject(intptr_t cid, intptr_t size) {
#if defined(DART_COMPRESSED_POINTERS)
const bool compressed = true;
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
#else
const bool compressed = false;
#endif
return Object::Allocate(cid, size, Heap::kNew, compressed);
}
DART_FORCE_INLINE
void UpdateLengthField(intptr_t cid, ObjectPtr from, ObjectPtr to) {
// We share these objects - never copy them.
ASSERT(!IsStringClassId(cid));
ASSERT(cid != kImmutableArrayCid);
// We update any in-heap variable sized object with the length to keep the
// length and the size in the object header in-sync for the GC.
if (cid == kArrayCid) {
static_cast<UntaggedArray*>(to.untag())->length_ =
static_cast<UntaggedArray*>(from.untag())->length_;
} else if (cid == kContextCid) {
static_cast<UntaggedContext*>(to.untag())->num_variables_ =
static_cast<UntaggedContext*>(from.untag())->num_variables_;
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
} else if (IsTypedDataClassId(cid)) {
static_cast<UntaggedTypedDataBase*>(to.untag())->length_ =
static_cast<UntaggedTypedDataBase*>(from.untag())->length_;
}
}
void InitializeExternalTypedData(intptr_t cid,
ExternalTypedDataPtr from,
ExternalTypedDataPtr to) {
auto raw_from = from.untag();
auto raw_to = to.untag();
const intptr_t length =
TypedData::ElementSizeInBytes(cid) * Smi::Value(raw_from->length_);
auto buffer = static_cast<uint8_t*>(malloc(length));
memmove(buffer, raw_from->data_, length);
raw_to->length_ = raw_from->length_;
raw_to->data_ = buffer;
}
void InitializeTypedDataView(TypedDataViewPtr obj) {
obj.untag()->typed_data_ = TypedDataBase::null();
obj.untag()->offset_in_bytes_ = 0;
obj.untag()->length_ = 0;
}
void FreeExternalTypedData(void* isolate_callback_data, void* buffer) {
free(buffer);
}
void FreeTransferablePeer(void* isolate_callback_data, void* peer) {
delete static_cast<TransferableTypedDataPeer*>(peer);
}
class ForwardMapBase {
public:
explicit ForwardMapBase(Thread* thread)
: thread_(thread), zone_(thread->zone()), isolate_(thread->isolate()) {}
protected:
friend class ObjectGraphCopier;
intptr_t GetObjectId(ObjectPtr object) {
if (object->IsNewObject()) {
return isolate_->forward_table_new()->GetValueExclusive(object);
} else {
return isolate_->forward_table_old()->GetValueExclusive(object);
}
}
void SetObjectId(ObjectPtr object, intptr_t id) {
if (object->IsNewObject()) {
isolate_->forward_table_new()->SetValueExclusive(object, id);
} else {
isolate_->forward_table_old()->SetValueExclusive(object, id);
}
}
void FinalizeTransferable(const TransferableTypedData& from,
const TransferableTypedData& to) {
// Get the old peer.
auto fpeer = static_cast<TransferableTypedDataPeer*>(
thread_->heap()->GetPeer(from.ptr()));
ASSERT(fpeer != nullptr && fpeer->data() != nullptr);
const intptr_t length = fpeer->length();
// Allocate new peer object with (data, length).
auto tpeer = new TransferableTypedDataPeer(fpeer->data(), length);
thread_->heap()->SetPeer(to.ptr(), tpeer);
// Move the handle itself to the new object.
fpeer->handle()->EnsureFreedExternal(thread_->isolate_group());
tpeer->set_handle(FinalizablePersistentHandle::New(
thread_->isolate_group(), to, tpeer, FreeTransferablePeer, length,
/*auto_delete=*/true));
fpeer->ClearData();
}
void FinalizeExternalTypedData(const ExternalTypedData& to) {
to.AddFinalizer(to.DataAddr(0), &FreeExternalTypedData, to.LengthInBytes());
}
Thread* thread_;
Zone* zone_;
Isolate* isolate_;
private:
DISALLOW_COPY_AND_ASSIGN(ForwardMapBase);
};
class FastForwardMap : public ForwardMapBase {
public:
explicit FastForwardMap(Thread* thread)
: ForwardMapBase(thread),
raw_from_to_(thread->zone(), 20),
raw_transferables_from_to_(thread->zone(), 0),
raw_objects_to_rehash_(thread->zone(), 0),
raw_expandos_to_rehash_(thread->zone(), 0) {
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
raw_from_to_.Resize(2);
raw_from_to_[0] = Object::null();
raw_from_to_[1] = Object::null();
fill_cursor_ = 2;
}
ObjectPtr ForwardedObject(ObjectPtr object) {
const intptr_t id = GetObjectId(object);
if (id == 0) return Marker();
return raw_from_to_[id + 1];
}
void Insert(ObjectPtr from, ObjectPtr to) {
ASSERT(ForwardedObject(from) == Marker());
ASSERT(raw_from_to_.length() == raw_from_to_.length());
const auto id = raw_from_to_.length();
SetObjectId(from, id);
raw_from_to_.Resize(id + 2);
raw_from_to_[id] = from;
raw_from_to_[id + 1] = to;
}
void AddTransferable(TransferableTypedDataPtr from,
TransferableTypedDataPtr to) {
raw_transferables_from_to_.Add(from);
raw_transferables_from_to_.Add(to);
}
void AddWeakProperty(WeakPropertyPtr from) { raw_weak_properties_.Add(from); }
[vm] Implement `WeakReference` in the VM This CL implements `WeakReference` in the VM. * This reduces the size of weak references from 2 objects using 8 words to 1 object using 4 words. * This makes loads of weak reference targets a single load instead of two. * This avoids the fix-point in the GC and message object copying for weak references. (N.b. Weak references need to be processed _after_ the fix-point for weak properties.) The semantics of weak references in messages is that their target gets set to `null` if the target is not included in the message by a strong reference. The tests take particular care to exercise the case where a weak reference's target is only kept alive because a weak property key is alive and it refers to the target in its value. This exercises the fact that weak references need to be processed last. Does not add support for weak references in the app snapshot. It would be dead code until we start using weak references in for example the CFE. This CL does not try to unify weak references and weak properties in the GC or messaging (as proposed in go/dart-vm-weakreference), because their semantics differ enough. Closes: https://github.com/dart-lang/sdk/issues/48162 TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart TEST=runtime/vm/object_test.cc TEST=tests/lib/isolate/weak_reference_message_1_test.dart TEST=tests/lib/isolate/weak_reference_message_2_test.dart Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087 Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Martin Kustermann <kustermann@google.com> Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
void AddWeakReference(WeakReferencePtr from) {
raw_weak_references_.Add(from);
}
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
void AddExternalTypedData(ExternalTypedDataPtr to) {
raw_external_typed_data_to_.Add(to);
}
void AddObjectToRehash(ObjectPtr to) { raw_objects_to_rehash_.Add(to); }
void AddExpandoToRehash(ObjectPtr to) { raw_expandos_to_rehash_.Add(to); }
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
private:
friend class FastObjectCopy;
friend class ObjectGraphCopier;
GrowableArray<ObjectPtr> raw_from_to_;
GrowableArray<TransferableTypedDataPtr> raw_transferables_from_to_;
GrowableArray<ExternalTypedDataPtr> raw_external_typed_data_to_;
GrowableArray<ObjectPtr> raw_objects_to_rehash_;
GrowableArray<ObjectPtr> raw_expandos_to_rehash_;
GrowableArray<WeakPropertyPtr> raw_weak_properties_;
[vm] Implement `WeakReference` in the VM This CL implements `WeakReference` in the VM. * This reduces the size of weak references from 2 objects using 8 words to 1 object using 4 words. * This makes loads of weak reference targets a single load instead of two. * This avoids the fix-point in the GC and message object copying for weak references. (N.b. Weak references need to be processed _after_ the fix-point for weak properties.) The semantics of weak references in messages is that their target gets set to `null` if the target is not included in the message by a strong reference. The tests take particular care to exercise the case where a weak reference's target is only kept alive because a weak property key is alive and it refers to the target in its value. This exercises the fact that weak references need to be processed last. Does not add support for weak references in the app snapshot. It would be dead code until we start using weak references in for example the CFE. This CL does not try to unify weak references and weak properties in the GC or messaging (as proposed in go/dart-vm-weakreference), because their semantics differ enough. Closes: https://github.com/dart-lang/sdk/issues/48162 TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart TEST=runtime/vm/object_test.cc TEST=tests/lib/isolate/weak_reference_message_1_test.dart TEST=tests/lib/isolate/weak_reference_message_2_test.dart Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087 Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Martin Kustermann <kustermann@google.com> Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
GrowableArray<WeakReferencePtr> raw_weak_references_;
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
intptr_t fill_cursor_ = 0;
DISALLOW_COPY_AND_ASSIGN(FastForwardMap);
};
class SlowForwardMap : public ForwardMapBase {
public:
explicit SlowForwardMap(Thread* thread)
: ForwardMapBase(thread),
from_to_(thread->zone(), 20),
transferables_from_to_(thread->zone(), 0) {
from_to_.Resize(2);
from_to_[0] = &Object::null_object();
from_to_[1] = &Object::null_object();
fill_cursor_ = 2;
}
ObjectPtr ForwardedObject(ObjectPtr object) {
const intptr_t id = GetObjectId(object);
if (id == 0) return Marker();
return from_to_[id + 1]->ptr();
}
void Insert(ObjectPtr from, ObjectPtr to) {
ASSERT(ForwardedObject(from) == Marker());
const auto id = from_to_.length();
SetObjectId(from, id);
from_to_.Resize(id + 2);
from_to_[id] = &Object::Handle(Z, from);
from_to_[id + 1] = &Object::Handle(Z, to);
}
void AddTransferable(const TransferableTypedData& from,
const TransferableTypedData& to) {
transferables_from_to_.Add(&TransferableTypedData::Handle(from.ptr()));
transferables_from_to_.Add(&TransferableTypedData::Handle(to.ptr()));
}
void AddWeakProperty(const WeakProperty& from) {
weak_properties_.Add(&WeakProperty::Handle(from.ptr()));
}
[vm] Implement `WeakReference` in the VM This CL implements `WeakReference` in the VM. * This reduces the size of weak references from 2 objects using 8 words to 1 object using 4 words. * This makes loads of weak reference targets a single load instead of two. * This avoids the fix-point in the GC and message object copying for weak references. (N.b. Weak references need to be processed _after_ the fix-point for weak properties.) The semantics of weak references in messages is that their target gets set to `null` if the target is not included in the message by a strong reference. The tests take particular care to exercise the case where a weak reference's target is only kept alive because a weak property key is alive and it refers to the target in its value. This exercises the fact that weak references need to be processed last. Does not add support for weak references in the app snapshot. It would be dead code until we start using weak references in for example the CFE. This CL does not try to unify weak references and weak properties in the GC or messaging (as proposed in go/dart-vm-weakreference), because their semantics differ enough. Closes: https://github.com/dart-lang/sdk/issues/48162 TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart TEST=runtime/vm/object_test.cc TEST=tests/lib/isolate/weak_reference_message_1_test.dart TEST=tests/lib/isolate/weak_reference_message_2_test.dart Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087 Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Martin Kustermann <kustermann@google.com> Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
void AddWeakReference(const WeakReference& from) {
weak_references_.Add(&WeakReference::Handle(from.ptr()));
}
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
void AddExternalTypedData(ExternalTypedDataPtr to) {
external_typed_data_.Add(&ExternalTypedData::Handle(to));
}
void AddObjectToRehash(const Object& to) {
objects_to_rehash_.Add(&Object::Handle(to.ptr()));
}
void AddExpandoToRehash(const Object& to) {
expandos_to_rehash_.Add(&Object::Handle(to.ptr()));
}
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
void FinalizeTransferables() {
for (intptr_t i = 0; i < transferables_from_to_.length(); i += 2) {
auto from = transferables_from_to_[i];
auto to = transferables_from_to_[i + 1];
FinalizeTransferable(*from, *to);
}
}
void FinalizeExternalTypedData() {
for (intptr_t i = 0; i < external_typed_data_.length(); i++) {
auto to = external_typed_data_[i];
ForwardMapBase::FinalizeExternalTypedData(*to);
}
}
private:
friend class SlowObjectCopy;
friend class ObjectGraphCopier;
GrowableArray<const Object*> from_to_;
GrowableArray<const TransferableTypedData*> transferables_from_to_;
GrowableArray<const ExternalTypedData*> external_typed_data_;
GrowableArray<const Object*> objects_to_rehash_;
GrowableArray<const Object*> expandos_to_rehash_;
GrowableArray<const WeakProperty*> weak_properties_;
[vm] Implement `WeakReference` in the VM This CL implements `WeakReference` in the VM. * This reduces the size of weak references from 2 objects using 8 words to 1 object using 4 words. * This makes loads of weak reference targets a single load instead of two. * This avoids the fix-point in the GC and message object copying for weak references. (N.b. Weak references need to be processed _after_ the fix-point for weak properties.) The semantics of weak references in messages is that their target gets set to `null` if the target is not included in the message by a strong reference. The tests take particular care to exercise the case where a weak reference's target is only kept alive because a weak property key is alive and it refers to the target in its value. This exercises the fact that weak references need to be processed last. Does not add support for weak references in the app snapshot. It would be dead code until we start using weak references in for example the CFE. This CL does not try to unify weak references and weak properties in the GC or messaging (as proposed in go/dart-vm-weakreference), because their semantics differ enough. Closes: https://github.com/dart-lang/sdk/issues/48162 TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart TEST=runtime/vm/object_test.cc TEST=tests/lib/isolate/weak_reference_message_1_test.dart TEST=tests/lib/isolate/weak_reference_message_2_test.dart Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087 Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Martin Kustermann <kustermann@google.com> Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
GrowableArray<const WeakReference*> weak_references_;
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
intptr_t fill_cursor_ = 0;
DISALLOW_COPY_AND_ASSIGN(SlowForwardMap);
};
class ObjectCopyBase {
public:
explicit ObjectCopyBase(Thread* thread)
: thread_(thread),
heap_base_(thread->heap_base()),
zone_(thread->zone()),
heap_(thread->isolate_group()->heap()),
class_table_(thread->isolate_group()->class_table()),
new_space_(heap_->new_space()),
tmp_(Object::Handle(thread->zone())),
expando_cid_(Class::GetClassId(
thread->isolate_group()->object_store()->expando_class())) {}
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
~ObjectCopyBase() {}
protected:
static ObjectPtr LoadPointer(ObjectPtr src, intptr_t offset) {
return src.untag()->LoadPointer(reinterpret_cast<ObjectPtr*>(
reinterpret_cast<uint8_t*>(src.untag()) + offset));
}
static CompressedObjectPtr LoadCompressedPointer(ObjectPtr src,
intptr_t offset) {
return src.untag()->LoadPointer(reinterpret_cast<CompressedObjectPtr*>(
reinterpret_cast<uint8_t*>(src.untag()) + offset));
}
static compressed_uword LoadCompressedNonPointerWord(ObjectPtr src,
intptr_t offset) {
return *reinterpret_cast<compressed_uword*>(
reinterpret_cast<uint8_t*>(src.untag()) + offset);
}
static void StorePointerBarrier(ObjectPtr obj,
intptr_t offset,
ObjectPtr value) {
obj.untag()->StorePointer(
reinterpret_cast<ObjectPtr*>(reinterpret_cast<uint8_t*>(obj.untag()) +
offset),
value);
}
static void StoreCompressedPointerBarrier(ObjectPtr obj,
intptr_t offset,
ObjectPtr value) {
obj.untag()->StoreCompressedPointer(
reinterpret_cast<CompressedObjectPtr*>(
reinterpret_cast<uint8_t*>(obj.untag()) + offset),
value);
}
void StoreCompressedLargeArrayPointerBarrier(ObjectPtr obj,
intptr_t offset,
ObjectPtr value) {
obj.untag()->StoreCompressedArrayPointer(
reinterpret_cast<CompressedObjectPtr*>(
reinterpret_cast<uint8_t*>(obj.untag()) + offset),
value, thread_);
}
static void StorePointerNoBarrier(ObjectPtr obj,
intptr_t offset,
ObjectPtr value) {
*reinterpret_cast<ObjectPtr*>(reinterpret_cast<uint8_t*>(obj.untag()) +
offset) = value;
}
template <typename T = ObjectPtr>
static void StoreCompressedPointerNoBarrier(ObjectPtr obj,
intptr_t offset,
T value) {
*reinterpret_cast<CompressedObjectPtr*>(
reinterpret_cast<uint8_t*>(obj.untag()) + offset) = value;
}
static void StoreCompressedNonPointerWord(ObjectPtr obj,
intptr_t offset,
compressed_uword value) {
*reinterpret_cast<compressed_uword*>(
reinterpret_cast<uint8_t*>(obj.untag()) + offset) = value;
}
DART_FORCE_INLINE
bool CanCopyObject(uword tags, ObjectPtr object) {
const auto cid = UntaggedObject::ClassIdTag::decode(tags);
if (cid > kNumPredefinedCids) {
const bool has_native_fields =
Class::NumNativeFieldsOf(class_table_->At(cid)) != 0;
if (has_native_fields) {
exception_msg_ =
OS::SCreate(zone_,
"Illegal argument in isolate message: (object extends "
"NativeWrapper - %s)",
Class::Handle(class_table_->At(cid)).ToCString());
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
return false;
}
return true;
}
#define HANDLE_ILLEGAL_CASE(Type) \
case k##Type##Cid: { \
exception_msg_ = \
"Illegal argument in isolate message: " \
"(object is a" #Type ")"; \
return false; \
}
switch (cid) {
// From "dart:ffi" we handle only Pointer/DynamicLibrary specially, since
// those are the only non-abstract classes (so we avoid checking more cids
// here that cannot happen in reality)
HANDLE_ILLEGAL_CASE(DynamicLibrary)
HANDLE_ILLEGAL_CASE(MirrorReference)
HANDLE_ILLEGAL_CASE(Pointer)
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
HANDLE_ILLEGAL_CASE(ReceivePort)
HANDLE_ILLEGAL_CASE(UserTag)
default:
return true;
}
}
Thread* thread_;
uword heap_base_;
Zone* zone_;
Heap* heap_;
ClassTable* class_table_;
Scavenger* new_space_;
Object& tmp_;
intptr_t expando_cid_;
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
const char* exception_msg_ = nullptr;
};
class FastObjectCopyBase : public ObjectCopyBase {
public:
using Types = PtrTypes;
explicit FastObjectCopyBase(Thread* thread)
: ObjectCopyBase(thread), fast_forward_map_(thread) {}
protected:
DART_FORCE_INLINE
void ForwardCompressedPointers(ObjectPtr src,
ObjectPtr dst,
intptr_t offset,
intptr_t end_offset) {
for (; offset < end_offset; offset += kCompressedWordSize) {
ForwardCompressedPointer(src, dst, offset);
}
}
DART_FORCE_INLINE
void ForwardCompressedPointers(ObjectPtr src,
ObjectPtr dst,
intptr_t offset,
intptr_t end_offset,
UnboxedFieldBitmap bitmap) {
if (bitmap.IsEmpty()) {
ForwardCompressedPointers(src, dst, offset, end_offset);
return;
}
intptr_t bit = offset >> kCompressedWordSizeLog2;
for (; offset < end_offset; offset += kCompressedWordSize) {
if (bitmap.Get(bit++)) {
StoreCompressedNonPointerWord(
dst, offset, LoadCompressedNonPointerWord(src, offset));
} else {
ForwardCompressedPointer(src, dst, offset);
}
}
}
void ForwardCompressedArrayPointers(intptr_t array_length,
ObjectPtr src,
ObjectPtr dst,
intptr_t offset,
intptr_t end_offset) {
for (; offset < end_offset; offset += kCompressedWordSize) {
ForwardCompressedPointer(src, dst, offset);
}
}
void ForwardCompressedContextPointers(intptr_t context_length,
ObjectPtr src,
ObjectPtr dst,
intptr_t offset,
intptr_t end_offset) {
for (; offset < end_offset; offset += kCompressedWordSize) {
ForwardCompressedPointer(src, dst, offset);
}
}
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
DART_FORCE_INLINE
void ForwardCompressedPointer(ObjectPtr src, ObjectPtr dst, intptr_t offset) {
auto value = LoadCompressedPointer(src, offset);
if (!value.IsHeapObject()) {
StoreCompressedPointerNoBarrier(dst, offset, value);
return;
}
auto value_decompressed = value.Decompress(heap_base_);
const uword tags = TagsFromUntaggedObject(value_decompressed.untag());
if (CanShareObject(value_decompressed, tags)) {
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
StoreCompressedPointerNoBarrier(dst, offset, value);
return;
}
ObjectPtr existing_to =
fast_forward_map_.ForwardedObject(value_decompressed);
if (existing_to != Marker()) {
StoreCompressedPointerNoBarrier(dst, offset, existing_to);
return;
}
if (UNLIKELY(!CanCopyObject(tags, value_decompressed))) {
ASSERT(exception_msg_ != nullptr);
StoreCompressedPointerNoBarrier(dst, offset, Object::null());
return;
}
auto to = Forward(tags, value_decompressed);
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
StoreCompressedPointerNoBarrier(dst, offset, to);
}
ObjectPtr Forward(uword tags, ObjectPtr from) {
const intptr_t header_size = UntaggedObject::SizeTag::decode(tags);
const auto cid = UntaggedObject::ClassIdTag::decode(tags);
const uword size =
header_size != 0 ? header_size : from.untag()->HeapSize();
if (Heap::IsAllocatableInNewSpace(size)) {
const uword alloc = new_space_->TryAllocate(thread_, size);
if (alloc != 0) {
ObjectPtr to(reinterpret_cast<UntaggedObject*>(alloc));
fast_forward_map_.Insert(from, to);
if (IsExternalTypedDataClassId(cid)) {
SetNewSpaceTaggingWord(to, cid, header_size);
InitializeExternalTypedData(cid, ExternalTypedData::RawCast(from),
ExternalTypedData::RawCast(to));
fast_forward_map_.AddExternalTypedData(
ExternalTypedData::RawCast(to));
} else if (IsTypedDataViewClassId(cid)) {
// We set the views backing store to `null` to satisfy an assertion in
// GCCompactor::VisitTypedDataViewPointers().
SetNewSpaceTaggingWord(to, cid, header_size);
InitializeTypedDataView(TypedDataView::RawCast(to));
}
return to;
}
}
exception_msg_ = kFastAllocationFailed;
return Marker();
}
void EnqueueTransferable(TransferableTypedDataPtr from,
TransferableTypedDataPtr to) {
fast_forward_map_.AddTransferable(from, to);
}
void EnqueueWeakProperty(WeakPropertyPtr from) {
fast_forward_map_.AddWeakProperty(from);
}
[vm] Implement `WeakReference` in the VM This CL implements `WeakReference` in the VM. * This reduces the size of weak references from 2 objects using 8 words to 1 object using 4 words. * This makes loads of weak reference targets a single load instead of two. * This avoids the fix-point in the GC and message object copying for weak references. (N.b. Weak references need to be processed _after_ the fix-point for weak properties.) The semantics of weak references in messages is that their target gets set to `null` if the target is not included in the message by a strong reference. The tests take particular care to exercise the case where a weak reference's target is only kept alive because a weak property key is alive and it refers to the target in its value. This exercises the fact that weak references need to be processed last. Does not add support for weak references in the app snapshot. It would be dead code until we start using weak references in for example the CFE. This CL does not try to unify weak references and weak properties in the GC or messaging (as proposed in go/dart-vm-weakreference), because their semantics differ enough. Closes: https://github.com/dart-lang/sdk/issues/48162 TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart TEST=runtime/vm/object_test.cc TEST=tests/lib/isolate/weak_reference_message_1_test.dart TEST=tests/lib/isolate/weak_reference_message_2_test.dart Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087 Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Martin Kustermann <kustermann@google.com> Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
void EnqueueWeakReference(WeakReferencePtr from) {
fast_forward_map_.AddWeakReference(from);
}
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
void EnqueueObjectToRehash(ObjectPtr to) {
fast_forward_map_.AddObjectToRehash(to);
}
void EnqueueExpandoToRehash(ObjectPtr to) {
fast_forward_map_.AddExpandoToRehash(to);
}
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
static void StoreCompressedArrayPointers(intptr_t array_length,
ObjectPtr src,
ObjectPtr dst,
intptr_t offset,
intptr_t end_offset) {
StoreCompressedPointers(src, dst, offset, end_offset);
}
static void StoreCompressedPointers(ObjectPtr src,
ObjectPtr dst,
intptr_t offset,
intptr_t end_offset) {
StoreCompressedPointersNoBarrier(src, dst, offset, end_offset);
}
static void StoreCompressedPointersNoBarrier(ObjectPtr src,
ObjectPtr dst,
intptr_t offset,
intptr_t end_offset) {
for (; offset <= end_offset; offset += kCompressedWordSize) {
StoreCompressedPointerNoBarrier(dst, offset,
LoadCompressedPointer(src, offset));
}
}
protected:
friend class ObjectGraphCopier;
FastForwardMap fast_forward_map_;
};
class SlowObjectCopyBase : public ObjectCopyBase {
public:
using Types = HandleTypes;
explicit SlowObjectCopyBase(Thread* thread)
: ObjectCopyBase(thread), slow_forward_map_(thread) {}
protected:
DART_FORCE_INLINE
void ForwardCompressedPointers(const Object& src,
const Object& dst,
intptr_t offset,
intptr_t end_offset) {
for (; offset < end_offset; offset += kCompressedWordSize) {
ForwardCompressedPointer(src, dst, offset);
}
}
DART_FORCE_INLINE
void ForwardCompressedPointers(const Object& src,
const Object& dst,
intptr_t offset,
intptr_t end_offset,
UnboxedFieldBitmap bitmap) {
intptr_t bit = offset >> kCompressedWordSizeLog2;
for (; offset < end_offset; offset += kCompressedWordSize) {
if (bitmap.Get(bit++)) {
StoreCompressedNonPointerWord(
dst.ptr(), offset, LoadCompressedNonPointerWord(src.ptr(), offset));
} else {
ForwardCompressedPointer(src, dst, offset);
}
}
}
void ForwardCompressedArrayPointers(intptr_t array_length,
const Object& src,
const Object& dst,
intptr_t offset,
intptr_t end_offset) {
if (Array::UseCardMarkingForAllocation(array_length)) {
for (; offset < end_offset; offset += kCompressedWordSize) {
ForwardCompressedLargeArrayPointer(src, dst, offset);
}
} else {
for (; offset < end_offset; offset += kCompressedWordSize) {
ForwardCompressedPointer(src, dst, offset);
}
}
}
void ForwardCompressedContextPointers(intptr_t context_length,
const Object& src,
const Object& dst,
intptr_t offset,
intptr_t end_offset) {
for (; offset < end_offset; offset += kCompressedWordSize) {
ForwardCompressedPointer(src, dst, offset);
}
}
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
DART_FORCE_INLINE
void ForwardCompressedLargeArrayPointer(const Object& src,
const Object& dst,
intptr_t offset) {
auto value = LoadCompressedPointer(src.ptr(), offset);
if (!value.IsHeapObject()) {
StoreCompressedPointerNoBarrier(dst.ptr(), offset, value);
return;
}
auto value_decompressed = value.Decompress(heap_base_);
const uword tags = TagsFromUntaggedObject(value_decompressed.untag());
if (CanShareObject(value_decompressed, tags)) {
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
StoreCompressedLargeArrayPointerBarrier(dst.ptr(), offset,
value_decompressed);
return;
}
ObjectPtr existing_to =
slow_forward_map_.ForwardedObject(value_decompressed);
if (existing_to != Marker()) {
StoreCompressedLargeArrayPointerBarrier(dst.ptr(), offset, existing_to);
return;
}
if (UNLIKELY(!CanCopyObject(tags, value_decompressed))) {
ASSERT(exception_msg_ != nullptr);
StoreCompressedLargeArrayPointerBarrier(dst.ptr(), offset,
Object::null());
return;
}
tmp_ = value_decompressed;
tmp_ = Forward(tags, tmp_); // Only this can cause allocation.
StoreCompressedLargeArrayPointerBarrier(dst.ptr(), offset, tmp_.ptr());
}
DART_FORCE_INLINE
void ForwardCompressedPointer(const Object& src,
const Object& dst,
intptr_t offset) {
auto value = LoadCompressedPointer(src.ptr(), offset);
if (!value.IsHeapObject()) {
StoreCompressedPointerNoBarrier(dst.ptr(), offset, value);
return;
}
auto value_decompressed = value.Decompress(heap_base_);
const uword tags = TagsFromUntaggedObject(value_decompressed.untag());
if (CanShareObject(value_decompressed, tags)) {
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
StoreCompressedPointerBarrier(dst.ptr(), offset, value_decompressed);
return;
}
ObjectPtr existing_to =
slow_forward_map_.ForwardedObject(value_decompressed);
if (existing_to != Marker()) {
StoreCompressedPointerBarrier(dst.ptr(), offset, existing_to);
return;
}
if (UNLIKELY(!CanCopyObject(tags, value_decompressed))) {
ASSERT(exception_msg_ != nullptr);
StoreCompressedPointerNoBarrier(dst.ptr(), offset, Object::null());
return;
}
tmp_ = value_decompressed;
tmp_ = Forward(tags, tmp_); // Only this can cause allocation.
StoreCompressedPointerBarrier(dst.ptr(), offset, tmp_.ptr());
}
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
ObjectPtr Forward(uword tags, const Object& from) {
const intptr_t cid = UntaggedObject::ClassIdTag::decode(tags);
intptr_t size = UntaggedObject::SizeTag::decode(tags);
if (size == 0) {
size = from.ptr().untag()->HeapSize();
}
ObjectPtr to = AllocateObject(cid, size);
slow_forward_map_.Insert(from.ptr(), to);
UpdateLengthField(cid, from.ptr(), to);
if (cid == kArrayCid && !Heap::IsAllocatableInNewSpace(size)) {
to.untag()->SetCardRememberedBitUnsynchronized();
}
if (IsExternalTypedDataClassId(cid)) {
InitializeExternalTypedData(cid, ExternalTypedData::RawCast(from.ptr()),
ExternalTypedData::RawCast(to));
slow_forward_map_.AddExternalTypedData(ExternalTypedData::RawCast(to));
} else if (IsTypedDataViewClassId(cid)) {
// We set the views backing store to `null` to satisfy an assertion in
// GCCompactor::VisitTypedDataViewPointers().
InitializeTypedDataView(TypedDataView::RawCast(to));
}
return to;
}
void EnqueueTransferable(const TransferableTypedData& from,
const TransferableTypedData& to) {
slow_forward_map_.AddTransferable(from, to);
}
void EnqueueWeakProperty(const WeakProperty& from) {
slow_forward_map_.AddWeakProperty(from);
}
[vm] Implement `WeakReference` in the VM This CL implements `WeakReference` in the VM. * This reduces the size of weak references from 2 objects using 8 words to 1 object using 4 words. * This makes loads of weak reference targets a single load instead of two. * This avoids the fix-point in the GC and message object copying for weak references. (N.b. Weak references need to be processed _after_ the fix-point for weak properties.) The semantics of weak references in messages is that their target gets set to `null` if the target is not included in the message by a strong reference. The tests take particular care to exercise the case where a weak reference's target is only kept alive because a weak property key is alive and it refers to the target in its value. This exercises the fact that weak references need to be processed last. Does not add support for weak references in the app snapshot. It would be dead code until we start using weak references in for example the CFE. This CL does not try to unify weak references and weak properties in the GC or messaging (as proposed in go/dart-vm-weakreference), because their semantics differ enough. Closes: https://github.com/dart-lang/sdk/issues/48162 TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart TEST=runtime/vm/object_test.cc TEST=tests/lib/isolate/weak_reference_message_1_test.dart TEST=tests/lib/isolate/weak_reference_message_2_test.dart Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087 Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Martin Kustermann <kustermann@google.com> Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
void EnqueueWeakReference(const WeakReference& from) {
slow_forward_map_.AddWeakReference(from);
}
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
void EnqueueObjectToRehash(const Object& to) {
slow_forward_map_.AddObjectToRehash(to);
}
void EnqueueExpandoToRehash(const Object& to) {
slow_forward_map_.AddExpandoToRehash(to);
}
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
void StoreCompressedArrayPointers(intptr_t array_length,
const Object& src,
const Object& dst,
intptr_t offset,
intptr_t end_offset) {
auto src_ptr = src.ptr();
auto dst_ptr = dst.ptr();
if (Array::UseCardMarkingForAllocation(array_length)) {
for (; offset <= end_offset; offset += kCompressedWordSize) {
StoreCompressedLargeArrayPointerBarrier(
dst_ptr, offset,
LoadCompressedPointer(src_ptr, offset).Decompress(heap_base_));
}
} else {
for (; offset <= end_offset; offset += kCompressedWordSize) {
StoreCompressedPointerBarrier(
dst_ptr, offset,
LoadCompressedPointer(src_ptr, offset).Decompress(heap_base_));
}
}
}
void StoreCompressedPointers(const Object& src,
const Object& dst,
intptr_t offset,
intptr_t end_offset) {
auto src_ptr = src.ptr();
auto dst_ptr = dst.ptr();
for (; offset <= end_offset; offset += kCompressedWordSize) {
StoreCompressedPointerBarrier(
dst_ptr, offset,
LoadCompressedPointer(src_ptr, offset).Decompress(heap_base_));
}
}
static void StoreCompressedPointersNoBarrier(const Object& src,
const Object& dst,
intptr_t offset,
intptr_t end_offset) {
auto src_ptr = src.ptr();
auto dst_ptr = dst.ptr();
for (; offset <= end_offset; offset += kCompressedWordSize) {
StoreCompressedPointerNoBarrier(dst_ptr, offset,
LoadCompressedPointer(src_ptr, offset));
}
}
protected:
friend class ObjectGraphCopier;
SlowForwardMap slow_forward_map_;
};
template <typename Base>
class ObjectCopy : public Base {
public:
using Types = typename Base::Types;
explicit ObjectCopy(Thread* thread) : Base(thread) {}
void CopyPredefinedInstance(typename Types::Object from,
typename Types::Object to,
intptr_t cid) {
if (IsImplicitFieldClassId(cid)) {
CopyUserdefinedInstance(from, to);
return;
}
switch (cid) {
#define COPY_TO(clazz) \
case clazz::kClassId: { \
typename Types::clazz casted_from = Types::Cast##clazz(from); \
typename Types::clazz casted_to = Types::Cast##clazz(to); \
Copy##clazz(casted_from, casted_to); \
return; \
}
CLASS_LIST_NO_OBJECT_NOR_STRING_NOR_ARRAY_NOR_MAP(COPY_TO)
COPY_TO(Array)
Reland "[vm] Hide internal implementation List types and expose them as List" This is a reland of 824bec596f522769bdee75c4d8b9dea785b685b5 Original change's description: > [vm] Hide internal implementation List types and expose them as List > > When taking a type of an instance with x.runtimeType we can map > internal classes _List, _ImmutableList and _GrowableList to a > user-visible List class. This is similar to what we do for > implementation classes of int, String and Type. > After that, result of x.runtimeType for built-in lists would be > compatible with List<T> type literals. > > Also, both intrinsic and native implementations of _haveSameRuntimeType > are updated to agree with new semantic of runtimeType. > > TEST=co19/LanguageFeatures/Constructor-tear-offs/type_literal_A01_t01 > TEST=runtime/tests/vm/dart/have_same_runtime_type_test > > Fixes https://github.com/dart-lang/sdk/issues/46893 > Issue https://github.com/dart-lang/sdk/issues/46231 > > Change-Id: Ie24a9f527f66a06118427b7a09e49c03dff93d8e > Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/210066 > Commit-Queue: Alexander Markov <alexmarkov@google.com> > Reviewed-by: Tess Strickland <sstrickl@google.com> TEST=co19/LanguageFeatures/Constructor-tear-offs/type_literal_A01_t01 TEST=runtime/tests/vm/dart/have_same_runtime_type_test TEST=lib/mirrors/regress_b196606044_test Fixes https://github.com/dart-lang/sdk/issues/46893 Issue https://github.com/dart-lang/sdk/issues/46231 Change-Id: I79b587540338808bd73a6554f00a5eed042f4c26 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/210201 Commit-Queue: Alexander Markov <alexmarkov@google.com> Reviewed-by: Tess Strickland <sstrickl@google.com>
2021-08-16 22:52:21 +00:00
COPY_TO(GrowableObjectArray)
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
COPY_TO(LinkedHashMap)
COPY_TO(LinkedHashSet)
#undef COPY_TO
#define COPY_TO(clazz) case kTypedData##clazz##Cid:
CLASS_LIST_TYPED_DATA(COPY_TO) {
typename Types::TypedData casted_from = Types::CastTypedData(from);
typename Types::TypedData casted_to = Types::CastTypedData(to);
CopyTypedData(casted_from, casted_to);
return;
}
#undef COPY_TO
case kByteDataViewCid:
#define COPY_TO(clazz) case kTypedData##clazz##ViewCid:
CLASS_LIST_TYPED_DATA(COPY_TO) {
typename Types::TypedDataView casted_from =
Types::CastTypedDataView(from);
typename Types::TypedDataView casted_to =
Types::CastTypedDataView(to);
CopyTypedDataView(casted_from, casted_to);
return;
}
#undef COPY_TO
#define COPY_TO(clazz) case kExternalTypedData##clazz##Cid:
CLASS_LIST_TYPED_DATA(COPY_TO) {
typename Types::ExternalTypedData casted_from =
Types::CastExternalTypedData(from);
typename Types::ExternalTypedData casted_to =
Types::CastExternalTypedData(to);
CopyExternalTypedData(casted_from, casted_to);
return;
}
#undef COPY_TO
default:
break;
}
const Object& obj = Types::HandlifyObject(from);
FATAL1("Unexpected object: %s\n", obj.ToCString());
}
#if defined(DART_PRECOMPILED_RUNTIME)
void CopyUserdefinedInstanceAOT(typename Types::Object from,
typename Types::Object to,
UnboxedFieldBitmap bitmap) {
const intptr_t instance_size = UntagObject(from)->HeapSize();
Base::ForwardCompressedPointers(from, to, kWordSize, instance_size, bitmap);
}
#endif
void CopyUserdefinedInstance(typename Types::Object from,
typename Types::Object to) {
const intptr_t instance_size = UntagObject(from)->HeapSize();
Base::ForwardCompressedPointers(from, to, kWordSize, instance_size);
}
void CopyClosure(typename Types::Closure from, typename Types::Closure to) {
Base::StoreCompressedPointers(
from, to, OFFSET_OF(UntaggedClosure, instantiator_type_arguments_),
OFFSET_OF(UntaggedClosure, function_));
Base::ForwardCompressedPointer(from, to,
OFFSET_OF(UntaggedClosure, context_));
Base::StoreCompressedPointersNoBarrier(from, to,
OFFSET_OF(UntaggedClosure, hash_),
OFFSET_OF(UntaggedClosure, hash_));
ONLY_IN_PRECOMPILED(UntagClosure(to)->entry_point_ =
UntagClosure(from)->entry_point_);
}
void CopyContext(typename Types::Context from, typename Types::Context to) {
const intptr_t length = Context::NumVariables(Types::GetContextPtr(from));
UntagContext(to)->num_variables_ = UntagContext(from)->num_variables_;
Base::ForwardCompressedPointer(from, to,
OFFSET_OF(UntaggedContext, parent_));
Base::ForwardCompressedContextPointers(
length, from, to, Context::variable_offset(0),
Context::variable_offset(0) + Context::kBytesPerElement * length);
}
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
void CopyArray(typename Types::Array from, typename Types::Array to) {
const intptr_t length = Smi::Value(UntagArray(from)->length());
Base::StoreCompressedArrayPointers(
length, from, to, OFFSET_OF(UntaggedArray, type_arguments_),
OFFSET_OF(UntaggedArray, type_arguments_));
Base::StoreCompressedPointersNoBarrier(from, to,
OFFSET_OF(UntaggedArray, length_),
OFFSET_OF(UntaggedArray, length_));
Base::ForwardCompressedArrayPointers(
length, from, to, Array::data_offset(),
Array::data_offset() + kCompressedWordSize * length);
}
void CopyGrowableObjectArray(typename Types::GrowableObjectArray from,
typename Types::GrowableObjectArray to) {
Base::StoreCompressedPointers(
from, to, OFFSET_OF(UntaggedGrowableObjectArray, type_arguments_),
OFFSET_OF(UntaggedGrowableObjectArray, type_arguments_));
Base::StoreCompressedPointersNoBarrier(
from, to, OFFSET_OF(UntaggedGrowableObjectArray, length_),
OFFSET_OF(UntaggedGrowableObjectArray, length_));
Base::ForwardCompressedPointer(
from, to, OFFSET_OF(UntaggedGrowableObjectArray, data_));
}
template <intptr_t one_for_set_two_for_map, typename T>
void CopyLinkedHashBase(T from,
T to,
UntaggedLinkedHashBase* from_untagged,
UntaggedLinkedHashBase* to_untagged) {
// We have to find out whether the map needs re-hashing on the receiver side
// due to keys being copied and the keys therefore possibly having different
// hash codes (e.g. due to user-defined hashCode implementation or due to
// new identity hash codes of the copied objects).
bool needs_rehashing = false;
ArrayPtr data = from_untagged->data_.Decompress(Base::heap_base_);
if (data != Array::null()) {
UntaggedArray* untagged_data = data.untag();
const intptr_t length = Smi::Value(untagged_data->length_);
auto key_value_pairs = untagged_data->data();
for (intptr_t i = 0; i < length; i += one_for_set_two_for_map) {
ObjectPtr key = key_value_pairs[i].Decompress(Base::heap_base_);
const bool is_deleted_entry = key == data;
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
if (key->IsHeapObject()) {
if (!is_deleted_entry && MightNeedReHashing(key)) {
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
needs_rehashing = true;
break;
}
}
}
}
Base::StoreCompressedPointers(
from, to, OFFSET_OF(UntaggedLinkedHashBase, type_arguments_),
OFFSET_OF(UntaggedLinkedHashBase, type_arguments_));
// Compared with the snapshot-based (de)serializer we do preserve the same
// backing store (i.e. used_data/deleted_keys/data) and therefore do not
// magically shrink backing store based on usage.
//
// We do this to avoid making assumptions about the object graph and the
// linked hash map (e.g. assuming there's no other references to the data,
// assuming the linked hashmap is in a consistent state)
if (needs_rehashing) {
to_untagged->hash_mask_ = Smi::New(0);
to_untagged->index_ = TypedData::RawCast(Object::null());
to_untagged->deleted_keys_ = Smi::New(0);
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
Base::EnqueueObjectToRehash(to);
}
// From this point on we shouldn't use the raw pointers, since GC might
// happen when forwarding objects.
from_untagged = nullptr;
to_untagged = nullptr;
if (!needs_rehashing) {
Base::ForwardCompressedPointer(from, to,
OFFSET_OF(UntaggedLinkedHashBase, index_));
Base::StoreCompressedPointersNoBarrier(
from, to, OFFSET_OF(UntaggedLinkedHashBase, hash_mask_),
OFFSET_OF(UntaggedLinkedHashBase, hash_mask_));
Base::StoreCompressedPointersNoBarrier(
from, to, OFFSET_OF(UntaggedLinkedHashMap, deleted_keys_),
OFFSET_OF(UntaggedLinkedHashMap, deleted_keys_));
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
}
Base::ForwardCompressedPointer(from, to,
OFFSET_OF(UntaggedLinkedHashBase, data_));
Base::StoreCompressedPointersNoBarrier(
from, to, OFFSET_OF(UntaggedLinkedHashBase, used_data_),
OFFSET_OF(UntaggedLinkedHashBase, used_data_));
}
void CopyLinkedHashMap(typename Types::LinkedHashMap from,
typename Types::LinkedHashMap to) {
CopyLinkedHashBase<2, typename Types::LinkedHashMap>(
from, to, UntagLinkedHashMap(from), UntagLinkedHashMap(to));
}
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
void CopyLinkedHashSet(typename Types::LinkedHashSet from,
typename Types::LinkedHashSet to) {
CopyLinkedHashBase<1, typename Types::LinkedHashSet>(
from, to, UntagLinkedHashSet(from), UntagLinkedHashSet(to));
}
void CopyDouble(typename Types::Double from, typename Types::Double to) {
#if !defined(DART_PRECOMPILED_RUNTIME)
auto raw_from = UntagDouble(from);
auto raw_to = UntagDouble(to);
raw_to->value_ = raw_from->value_;
#else
// Will be shared and not copied.
UNREACHABLE();
#endif
}
void CopyFloat32x4(typename Types::Float32x4 from,
typename Types::Float32x4 to) {
#if !defined(DART_PRECOMPILED_RUNTIME)
auto raw_from = UntagFloat32x4(from);
auto raw_to = UntagFloat32x4(to);
raw_to->value_[0] = raw_from->value_[0];
raw_to->value_[1] = raw_from->value_[1];
raw_to->value_[2] = raw_from->value_[2];
raw_to->value_[3] = raw_from->value_[3];
#else
// Will be shared and not copied.
UNREACHABLE();
#endif
}
void CopyFloat64x2(typename Types::Float64x2 from,
typename Types::Float64x2 to) {
#if !defined(DART_PRECOMPILED_RUNTIME)
auto raw_from = UntagFloat64x2(from);
auto raw_to = UntagFloat64x2(to);
raw_to->value_[0] = raw_from->value_[0];
raw_to->value_[1] = raw_from->value_[1];
#else
// Will be shared and not copied.
UNREACHABLE();
#endif
}
void CopyTypedData(typename Types::TypedData from,
typename Types::TypedData to) {
auto raw_from = UntagTypedData(from);
auto raw_to = UntagTypedData(to);
const intptr_t cid = Types::GetTypedDataPtr(from)->GetClassId();
raw_to->length_ = raw_from->length_;
raw_to->RecomputeDataField();
const intptr_t length =
TypedData::ElementSizeInBytes(cid) * Smi::Value(raw_from->length_);
memmove(raw_to->data_, raw_from->data_, length);
}
void CopyTypedDataView(typename Types::TypedDataView from,
typename Types::TypedDataView to) {
// This will forward & initialize the typed data.
Base::ForwardCompressedPointer(
from, to, OFFSET_OF(UntaggedTypedDataView, typed_data_));
auto raw_from = UntagTypedDataView(from);
auto raw_to = UntagTypedDataView(to);
raw_to->length_ = raw_from->length_;
raw_to->offset_in_bytes_ = raw_from->offset_in_bytes_;
raw_to->data_ = nullptr;
auto forwarded_backing_store =
raw_to->typed_data_.Decompress(Base::heap_base_);
if (forwarded_backing_store == Marker() ||
forwarded_backing_store == Object::null()) {
// Ensure the backing store is never "sentinel" - the scavenger doesn't
// like it.
Base::StoreCompressedPointerNoBarrier(
Types::GetTypedDataViewPtr(to),
OFFSET_OF(UntaggedTypedDataView, typed_data_), Object::null());
raw_to->length_ = 0;
raw_to->offset_in_bytes_ = 0;
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
ASSERT(Base::exception_msg_ != nullptr);
return;
}
const bool is_external =
raw_from->data_ != raw_from->DataFieldForInternalTypedData();
if (is_external) {
// The raw_to is fully initialized at this point (see handling of external
// typed data in [ForwardCompressedPointer])
raw_to->RecomputeDataField();
} else {
// The raw_to isn't initialized yet, but it's address is valid, so we can
// compute the data field it would use.
raw_to->RecomputeDataFieldForInternalTypedData();
}
const bool is_external2 =
raw_to->data_ != raw_to->DataFieldForInternalTypedData();
ASSERT(is_external == is_external2);
}
void CopyExternalTypedData(typename Types::ExternalTypedData from,
typename Types::ExternalTypedData to) {
// The external typed data is initialized on the forwarding pass (where
// normally allocation but not initialization happens), so views on it
// can be initialized immediately.
#if defined(DEBUG)
auto raw_from = UntagExternalTypedData(from);
auto raw_to = UntagExternalTypedData(to);
ASSERT(raw_to->data_ != nullptr);
ASSERT(raw_to->length_ == raw_from->length_);
#endif
}
void CopyTransferableTypedData(typename Types::TransferableTypedData from,
typename Types::TransferableTypedData to) {
// The [TransferableTypedData] is an empty object with an associated heap
// peer object.
// -> We'll validate that there's a peer and enqueue the transferable to be
// transferred if the transitive copy is successful.
auto fpeer = static_cast<TransferableTypedDataPeer*>(
Base::heap_->GetPeer(Types::GetTransferableTypedDataPtr(from)));
ASSERT(fpeer != nullptr);
if (fpeer->data() == nullptr) {
Base::exception_msg_ =
"Illegal argument in isolate message"
" : (TransferableTypedData has been transferred already)";
return;
}
Base::EnqueueTransferable(from, to);
}
void CopyWeakProperty(typename Types::WeakProperty from,
typename Types::WeakProperty to) {
// We store `null`s as keys/values and let the main algorithm know that
// we should check reachability of the key again after the fixpoint (if it
// became reachable, forward the key/value).
Base::StoreCompressedPointerNoBarrier(Types::GetWeakPropertyPtr(to),
OFFSET_OF(UntaggedWeakProperty, key_),
Object::null());
Base::StoreCompressedPointerNoBarrier(
Types::GetWeakPropertyPtr(to), OFFSET_OF(UntaggedWeakProperty, value_),
Object::null());
[vm] Implement `WeakReference` in the VM This CL implements `WeakReference` in the VM. * This reduces the size of weak references from 2 objects using 8 words to 1 object using 4 words. * This makes loads of weak reference targets a single load instead of two. * This avoids the fix-point in the GC and message object copying for weak references. (N.b. Weak references need to be processed _after_ the fix-point for weak properties.) The semantics of weak references in messages is that their target gets set to `null` if the target is not included in the message by a strong reference. The tests take particular care to exercise the case where a weak reference's target is only kept alive because a weak property key is alive and it refers to the target in its value. This exercises the fact that weak references need to be processed last. Does not add support for weak references in the app snapshot. It would be dead code until we start using weak references in for example the CFE. This CL does not try to unify weak references and weak properties in the GC or messaging (as proposed in go/dart-vm-weakreference), because their semantics differ enough. Closes: https://github.com/dart-lang/sdk/issues/48162 TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart TEST=runtime/vm/object_test.cc TEST=tests/lib/isolate/weak_reference_message_1_test.dart TEST=tests/lib/isolate/weak_reference_message_2_test.dart Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087 Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Martin Kustermann <kustermann@google.com> Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
// To satisfy some ASSERT()s in GC we'll use Object:null() explicitly here.
Base::StoreCompressedPointerNoBarrier(
Types::GetWeakPropertyPtr(to), OFFSET_OF(UntaggedWeakProperty, next_),
Object::null());
Base::EnqueueWeakProperty(from);
}
[vm] Implement `WeakReference` in the VM This CL implements `WeakReference` in the VM. * This reduces the size of weak references from 2 objects using 8 words to 1 object using 4 words. * This makes loads of weak reference targets a single load instead of two. * This avoids the fix-point in the GC and message object copying for weak references. (N.b. Weak references need to be processed _after_ the fix-point for weak properties.) The semantics of weak references in messages is that their target gets set to `null` if the target is not included in the message by a strong reference. The tests take particular care to exercise the case where a weak reference's target is only kept alive because a weak property key is alive and it refers to the target in its value. This exercises the fact that weak references need to be processed last. Does not add support for weak references in the app snapshot. It would be dead code until we start using weak references in for example the CFE. This CL does not try to unify weak references and weak properties in the GC or messaging (as proposed in go/dart-vm-weakreference), because their semantics differ enough. Closes: https://github.com/dart-lang/sdk/issues/48162 TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart TEST=runtime/vm/object_test.cc TEST=tests/lib/isolate/weak_reference_message_1_test.dart TEST=tests/lib/isolate/weak_reference_message_2_test.dart Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087 Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Martin Kustermann <kustermann@google.com> Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
void CopyWeakReference(typename Types::WeakReference from,
typename Types::WeakReference to) {
// We store `null` as target and let the main algorithm know that
// we should check reachability of the target again after the fixpoint (if
// it became reachable, forward the target).
Base::StoreCompressedPointerNoBarrier(
Types::GetWeakReferencePtr(to),
OFFSET_OF(UntaggedWeakReference, target_), Object::null());
// Type argument should always be copied.
Base::ForwardCompressedPointer(
from, to, OFFSET_OF(UntaggedWeakReference, type_arguments_));
// To satisfy some ASSERT()s in GC we'll use Object:null() explicitly here.
Base::StoreCompressedPointerNoBarrier(
Types::GetWeakReferencePtr(to), OFFSET_OF(UntaggedWeakReference, next_),
Object::null());
Base::EnqueueWeakReference(from);
}
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
#define DEFINE_UNSUPPORTED(clazz) \
void Copy##clazz(typename Types::clazz from, typename Types::clazz to) { \
FATAL("Objects of type " #clazz " should not occur in object graphs"); \
}
FOR_UNSUPPORTED_CLASSES(DEFINE_UNSUPPORTED)
#undef DEFINE_UNSUPPORTED
UntaggedObject* UntagObject(typename Types::Object obj) {
return Types::GetObjectPtr(obj).Decompress(Base::heap_base_).untag();
}
#define DO(V) \
DART_FORCE_INLINE \
Untagged##V* Untag##V(typename Types::V obj) { \
return Types::Get##V##Ptr(obj).Decompress(Base::heap_base_).untag(); \
}
CLASS_LIST_FOR_HANDLES(DO)
#undef DO
};
class FastObjectCopy : public ObjectCopy<FastObjectCopyBase> {
public:
explicit FastObjectCopy(Thread* thread) : ObjectCopy(thread) {}
~FastObjectCopy() {}
ObjectPtr TryCopyGraphFast(ObjectPtr root) {
NoSafepointScope no_safepoint_scope;
ObjectPtr root_copy = Forward(TagsFromUntaggedObject(root.untag()), root);
if (root_copy == Marker()) {
return root_copy;
}
auto& from_weak_property = WeakProperty::Handle(zone_);
auto& to_weak_property = WeakProperty::Handle(zone_);
auto& weak_property_key = Object::Handle(zone_);
while (true) {
if (fast_forward_map_.fill_cursor_ ==
fast_forward_map_.raw_from_to_.length()) {
break;
}
// Run fixpoint to copy all objects.
while (fast_forward_map_.fill_cursor_ <
fast_forward_map_.raw_from_to_.length()) {
const intptr_t index = fast_forward_map_.fill_cursor_;
ObjectPtr from = fast_forward_map_.raw_from_to_[index];
ObjectPtr to = fast_forward_map_.raw_from_to_[index + 1];
FastCopyObject(from, to);
if (exception_msg_ != nullptr) {
return root_copy;
}
fast_forward_map_.fill_cursor_ += 2;
}
// Possibly forward values of [WeakProperty]s if keys became reachable.
intptr_t i = 0;
auto& weak_properties = fast_forward_map_.raw_weak_properties_;
while (i < weak_properties.length()) {
from_weak_property = weak_properties[i];
weak_property_key =
fast_forward_map_.ForwardedObject(from_weak_property.key());
if (weak_property_key.ptr() != Marker()) {
to_weak_property ^=
fast_forward_map_.ForwardedObject(from_weak_property.ptr());
// The key became reachable so we'll change the forwarded
// [WeakProperty]'s key to the new key (it is `null` at this point).
to_weak_property.set_key(weak_property_key);
// Since the key has become strongly reachable in the copied graph,
// we'll also need to forward the value.
ForwardCompressedPointer(from_weak_property.ptr(),
to_weak_property.ptr(),
OFFSET_OF(UntaggedWeakProperty, value_));
// We don't need to process this [WeakProperty] again.
const intptr_t last = weak_properties.length() - 1;
if (i < last) {
weak_properties[i] = weak_properties[last];
weak_properties.SetLength(last);
continue;
}
}
i++;
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
}
}
[vm] Implement `WeakReference` in the VM This CL implements `WeakReference` in the VM. * This reduces the size of weak references from 2 objects using 8 words to 1 object using 4 words. * This makes loads of weak reference targets a single load instead of two. * This avoids the fix-point in the GC and message object copying for weak references. (N.b. Weak references need to be processed _after_ the fix-point for weak properties.) The semantics of weak references in messages is that their target gets set to `null` if the target is not included in the message by a strong reference. The tests take particular care to exercise the case where a weak reference's target is only kept alive because a weak property key is alive and it refers to the target in its value. This exercises the fact that weak references need to be processed last. Does not add support for weak references in the app snapshot. It would be dead code until we start using weak references in for example the CFE. This CL does not try to unify weak references and weak properties in the GC or messaging (as proposed in go/dart-vm-weakreference), because their semantics differ enough. Closes: https://github.com/dart-lang/sdk/issues/48162 TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart TEST=runtime/vm/object_test.cc TEST=tests/lib/isolate/weak_reference_message_1_test.dart TEST=tests/lib/isolate/weak_reference_message_2_test.dart Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087 Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Martin Kustermann <kustermann@google.com> Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
// After the fix point with [WeakProperty]s do [WeakReference]s.
auto& from_weak_reference = WeakReference::Handle(zone_);
auto& to_weak_reference = WeakReference::Handle(zone_);
auto& weak_reference_target = Object::Handle(zone_);
auto& weak_references = fast_forward_map_.raw_weak_references_;
for (intptr_t i = 0; i < weak_references.length(); i++) {
from_weak_reference = weak_references[i];
weak_reference_target =
fast_forward_map_.ForwardedObject(from_weak_reference.target());
if (weak_reference_target.ptr() != Marker()) {
to_weak_reference ^=
fast_forward_map_.ForwardedObject(from_weak_reference.ptr());
// The target became reachable so we'll change the forwarded
// [WeakReference]'s target to the new target (it is `null` at this
// point).
to_weak_reference.set_target(weak_reference_target);
}
}
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
if (root_copy != Marker()) {
ObjectPtr array;
array = TryBuildArrayOfObjectsToRehash(
fast_forward_map_.raw_objects_to_rehash_);
if (array == Marker()) return root_copy;
raw_objects_to_rehash_ = Array::RawCast(array);
array = TryBuildArrayOfObjectsToRehash(
fast_forward_map_.raw_expandos_to_rehash_);
if (array == Marker()) return root_copy;
raw_expandos_to_rehash_ = Array::RawCast(array);
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
}
return root_copy;
}
ObjectPtr TryBuildArrayOfObjectsToRehash(
const GrowableArray<ObjectPtr>& objects_to_rehash) {
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
const intptr_t length = objects_to_rehash.length();
if (length == 0) return Object::null();
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
const intptr_t size = Array::InstanceSize(length);
const uword array_addr = new_space_->TryAllocate(thread_, size);
if (array_addr == 0) {
exception_msg_ = kFastAllocationFailed;
return Marker();
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
}
const uword header_size =
UntaggedObject::SizeTag::SizeFits(size) ? size : 0;
ArrayPtr array(reinterpret_cast<UntaggedArray*>(array_addr));
SetNewSpaceTaggingWord(array, kArrayCid, header_size);
StoreCompressedPointerNoBarrier(array, OFFSET_OF(UntaggedArray, length_),
Smi::New(length));
StoreCompressedPointerNoBarrier(array,
OFFSET_OF(UntaggedArray, type_arguments_),
TypeArguments::null());
auto array_data = array.untag()->data();
for (intptr_t i = 0; i < length; ++i) {
array_data[i] = objects_to_rehash[i];
}
return array;
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
}
private:
friend class ObjectGraphCopier;
void FastCopyObject(ObjectPtr from, ObjectPtr to) {
const uword tags = TagsFromUntaggedObject(from.untag());
const intptr_t cid = UntaggedObject::ClassIdTag::decode(tags);
const intptr_t size = UntaggedObject::SizeTag::decode(tags);
// Ensure the last word is GC-safe (our heap objects are 2-word aligned, the
// object header stores the size in multiples of kObjectAlignment, the GC
// uses the information from the header and therefore might visit one slot
// more than the actual size of the instance).
*reinterpret_cast<ObjectPtr*>(UntaggedObject::ToAddr(to) +
from.untag()->HeapSize() - kWordSize) = 0;
SetNewSpaceTaggingWord(to, cid, size);
// Fall back to virtual variant for predefined classes
if (cid < kNumPredefinedCids && cid != kInstanceCid) {
CopyPredefinedInstance(from, to, cid);
return;
}
#if defined(DART_PRECOMPILED_RUNTIME)
const auto bitmap =
class_table_->shared_class_table()->GetUnboxedFieldsMapAt(cid);
CopyUserdefinedInstanceAOT(Instance::RawCast(from), Instance::RawCast(to),
bitmap);
#else
CopyUserdefinedInstance(Instance::RawCast(from), Instance::RawCast(to));
#endif
if (cid == expando_cid_) {
EnqueueExpandoToRehash(to);
}
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
}
ArrayPtr raw_objects_to_rehash_ = Array::null();
ArrayPtr raw_expandos_to_rehash_ = Array::null();
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
};
class SlowObjectCopy : public ObjectCopy<SlowObjectCopyBase> {
public:
explicit SlowObjectCopy(Thread* thread)
: ObjectCopy(thread),
objects_to_rehash_(Array::Handle(thread->zone())),
expandos_to_rehash_(Array::Handle(thread->zone())) {}
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
~SlowObjectCopy() {}
ObjectPtr ContinueCopyGraphSlow(const Object& root,
const Object& fast_root_copy) {
auto& root_copy = Object::Handle(Z, fast_root_copy.ptr());
if (root_copy.ptr() == Marker()) {
root_copy = Forward(TagsFromUntaggedObject(root.ptr().untag()), root);
}
WeakProperty& weak_property = WeakProperty::Handle(Z);
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
Object& from = Object::Handle(Z);
Object& to = Object::Handle(Z);
while (true) {
if (slow_forward_map_.fill_cursor_ ==
slow_forward_map_.from_to_.length()) {
break;
}
// Run fixpoint to copy all objects.
while (slow_forward_map_.fill_cursor_ <
slow_forward_map_.from_to_.length()) {
const intptr_t index = slow_forward_map_.fill_cursor_;
from = slow_forward_map_.from_to_[index]->ptr();
to = slow_forward_map_.from_to_[index + 1]->ptr();
CopyObject(from, to);
slow_forward_map_.fill_cursor_ += 2;
if (exception_msg_ != nullptr) {
return Marker();
}
}
// Possibly forward values of [WeakProperty]s if keys became reachable.
intptr_t i = 0;
auto& weak_properties = slow_forward_map_.weak_properties_;
while (i < weak_properties.length()) {
const auto& from_weak_property = *weak_properties[i];
to = slow_forward_map_.ForwardedObject(from_weak_property.key());
if (to.ptr() != Marker()) {
weak_property ^=
slow_forward_map_.ForwardedObject(from_weak_property.ptr());
// The key became reachable so we'll change the forwarded
// [WeakProperty]'s key to the new key (it is `null` at this point).
weak_property.set_key(to);
// Since the key has become strongly reachable in the copied graph,
// we'll also need to forward the value.
ForwardCompressedPointer(from_weak_property, weak_property,
OFFSET_OF(UntaggedWeakProperty, value_));
// We don't need to process this [WeakProperty] again.
const intptr_t last = weak_properties.length() - 1;
if (i < last) {
weak_properties[i] = weak_properties[last];
weak_properties.SetLength(last);
continue;
}
}
i++;
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
}
}
[vm] Implement `WeakReference` in the VM This CL implements `WeakReference` in the VM. * This reduces the size of weak references from 2 objects using 8 words to 1 object using 4 words. * This makes loads of weak reference targets a single load instead of two. * This avoids the fix-point in the GC and message object copying for weak references. (N.b. Weak references need to be processed _after_ the fix-point for weak properties.) The semantics of weak references in messages is that their target gets set to `null` if the target is not included in the message by a strong reference. The tests take particular care to exercise the case where a weak reference's target is only kept alive because a weak property key is alive and it refers to the target in its value. This exercises the fact that weak references need to be processed last. Does not add support for weak references in the app snapshot. It would be dead code until we start using weak references in for example the CFE. This CL does not try to unify weak references and weak properties in the GC or messaging (as proposed in go/dart-vm-weakreference), because their semantics differ enough. Closes: https://github.com/dart-lang/sdk/issues/48162 TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart TEST=runtime/vm/object_test.cc TEST=tests/lib/isolate/weak_reference_message_1_test.dart TEST=tests/lib/isolate/weak_reference_message_2_test.dart Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087 Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Martin Kustermann <kustermann@google.com> Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
// After the fix point with [WeakProperty]s do [WeakReference]s.
WeakReference& weak_reference = WeakReference::Handle(Z);
auto& weak_references = slow_forward_map_.weak_references_;
for (intptr_t i = 0; i < weak_references.length(); i++) {
const auto& from_weak_reference = *weak_references[i];
to = slow_forward_map_.ForwardedObject(from_weak_reference.target());
if (to.ptr() != Marker()) {
weak_reference ^=
slow_forward_map_.ForwardedObject(from_weak_reference.ptr());
// The target became reachable so we'll change the forwarded
// [WeakReference]'s target to the new target (it is `null` at this
// point).
weak_reference.set_target(to);
}
}
objects_to_rehash_ =
BuildArrayOfObjectsToRehash(slow_forward_map_.objects_to_rehash_);
expandos_to_rehash_ =
BuildArrayOfObjectsToRehash(slow_forward_map_.expandos_to_rehash_);
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
return root_copy.ptr();
}
ArrayPtr BuildArrayOfObjectsToRehash(
const GrowableArray<const Object*>& objects_to_rehash) {
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
const intptr_t length = objects_to_rehash.length();
if (length == 0) return Array::null();
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
const auto& array = Array::Handle(zone_, Array::New(length));
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
for (intptr_t i = 0; i < length; ++i) {
array.SetAt(i, *objects_to_rehash[i]);
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
}
return array.ptr();
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
}
private:
friend class ObjectGraphCopier;
void CopyObject(const Object& from, const Object& to) {
const auto cid = from.GetClassId();
// Fall back to virtual variant for predefined classes
if (cid < kNumPredefinedCids && cid != kInstanceCid) {
CopyPredefinedInstance(from, to, cid);
return;
}
#if defined(DART_PRECOMPILED_RUNTIME)
const auto bitmap =
class_table_->shared_class_table()->GetUnboxedFieldsMapAt(cid);
CopyUserdefinedInstanceAOT(from, to, bitmap);
#else
CopyUserdefinedInstance(from, to);
#endif
if (cid == expando_cid_) {
EnqueueExpandoToRehash(to);
}
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
}
Array& objects_to_rehash_;
Array& expandos_to_rehash_;
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
};
class ObjectGraphCopier {
public:
explicit ObjectGraphCopier(Thread* thread)
: thread_(thread),
zone_(thread->zone()),
fast_object_copy_(thread_),
slow_object_copy_(thread_) {
thread_->isolate()->set_forward_table_new(new WeakTable());
thread_->isolate()->set_forward_table_old(new WeakTable());
}
~ObjectGraphCopier() {
thread_->isolate()->set_forward_table_new(nullptr);
thread_->isolate()->set_forward_table_old(nullptr);
}
// Result will be
// [
// <message>,
// <collection-lib-objects-to-rehash>,
// <core-lib-objects-to-rehash>,
// ]
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
ObjectPtr CopyObjectGraph(const Object& root) {
const char* volatile exception_msg = nullptr;
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
auto& result = Object::Handle(zone_);
{
LongJumpScope jump; // e.g. for OOMs.
if (setjmp(*jump.Set()) == 0) {
result = CopyObjectGraphInternal(root, &exception_msg);
// Any allocated external typed data must have finalizers attached so
// memory will get free()ed.
slow_object_copy_.slow_forward_map_.FinalizeExternalTypedData();
} else {
// Any allocated external typed data must have finalizers attached so
// memory will get free()ed.
slow_object_copy_.slow_forward_map_.FinalizeExternalTypedData();
// The copy failed due to non-application error (e.g. OOM error),
// propagate this error.
result = thread_->StealStickyError();
RELEASE_ASSERT(result.IsError());
}
}
if (result.IsError()) {
Exceptions::PropagateError(Error::Cast(result));
UNREACHABLE();
}
if (result.ptr() == Marker()) {
ASSERT(exception_msg != nullptr);
ThrowException(exception_msg);
UNREACHABLE();
}
// The copy was successful, then detach transferable data from the sender
// and attach to the copied graph.
slow_object_copy_.slow_forward_map_.FinalizeTransferables();
return result.ptr();
}
private:
ObjectPtr CopyObjectGraphInternal(const Object& root,
const char* volatile* exception_msg) {
const auto& result_array = Array::Handle(zone_, Array::New(3));
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
if (!root.ptr()->IsHeapObject()) {
result_array.SetAt(0, root);
return result_array.ptr();
}
const uword tags = TagsFromUntaggedObject(root.ptr().untag());
if (CanShareObject(root.ptr(), tags)) {
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
result_array.SetAt(0, root);
return result_array.ptr();
}
if (!fast_object_copy_.CanCopyObject(tags, root.ptr())) {
ASSERT(fast_object_copy_.exception_msg_ != nullptr);
*exception_msg = fast_object_copy_.exception_msg_;
return Marker();
}
// We try a fast new-space only copy first that will not use any barriers.
auto& result = Object::Handle(Z, Marker());
// All allocated but non-initialized heap objects have to be made GC-visible
// at this point.
if (FLAG_enable_fast_object_copy) {
{
NoSafepointScope no_safepoint_scope;
result = fast_object_copy_.TryCopyGraphFast(root.ptr());
if (result.ptr() != Marker()) {
if (fast_object_copy_.exception_msg_ == nullptr) {
result_array.SetAt(0, result);
fast_object_copy_.tmp_ = fast_object_copy_.raw_objects_to_rehash_;
result_array.SetAt(1, fast_object_copy_.tmp_);
fast_object_copy_.tmp_ = fast_object_copy_.raw_expandos_to_rehash_;
result_array.SetAt(2, fast_object_copy_.tmp_);
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
HandlifyExternalTypedData();
HandlifyTransferables();
return result_array.ptr();
}
// There are left-over uninitialized objects we'll have to make GC
// visible.
SwitchToSlowFowardingList();
}
}
if (FLAG_gc_on_foc_slow_path) {
// We use kLowMemory to force the GC to compact, which is more likely to
// discover untracked pointers (and other issues, like incorrect class
// table).
thread_->heap()->CollectAllGarbage(GCReason::kLowMemory);
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
}
// Fast copy failed due to
// - either failure to allocate into new space
// - or failure to copy object which we cannot copy
ASSERT(fast_object_copy_.exception_msg_ != nullptr);
if (fast_object_copy_.exception_msg_ != kFastAllocationFailed) {
*exception_msg = fast_object_copy_.exception_msg_;
return Marker();
}
ASSERT(fast_object_copy_.exception_msg_ == kFastAllocationFailed);
}
// Use the slow copy approach.
result = slow_object_copy_.ContinueCopyGraphSlow(root, result);
ASSERT((result.ptr() == Marker()) ==
(slow_object_copy_.exception_msg_ != nullptr));
if (result.ptr() == Marker()) {
*exception_msg = slow_object_copy_.exception_msg_;
return Marker();
}
result_array.SetAt(0, result);
result_array.SetAt(1, slow_object_copy_.objects_to_rehash_);
result_array.SetAt(2, slow_object_copy_.expandos_to_rehash_);
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
return result_array.ptr();
}
void SwitchToSlowFowardingList() {
auto& fast_forward_map = fast_object_copy_.fast_forward_map_;
auto& slow_forward_map = slow_object_copy_.slow_forward_map_;
MakeUninitializedNewSpaceObjectsGCSafe();
HandlifyTransferables();
HandlifyWeakProperties();
[vm] Implement `WeakReference` in the VM This CL implements `WeakReference` in the VM. * This reduces the size of weak references from 2 objects using 8 words to 1 object using 4 words. * This makes loads of weak reference targets a single load instead of two. * This avoids the fix-point in the GC and message object copying for weak references. (N.b. Weak references need to be processed _after_ the fix-point for weak properties.) The semantics of weak references in messages is that their target gets set to `null` if the target is not included in the message by a strong reference. The tests take particular care to exercise the case where a weak reference's target is only kept alive because a weak property key is alive and it refers to the target in its value. This exercises the fact that weak references need to be processed last. Does not add support for weak references in the app snapshot. It would be dead code until we start using weak references in for example the CFE. This CL does not try to unify weak references and weak properties in the GC or messaging (as proposed in go/dart-vm-weakreference), because their semantics differ enough. Closes: https://github.com/dart-lang/sdk/issues/48162 TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart TEST=runtime/vm/object_test.cc TEST=tests/lib/isolate/weak_reference_message_1_test.dart TEST=tests/lib/isolate/weak_reference_message_2_test.dart Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087 Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Martin Kustermann <kustermann@google.com> Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
HandlifyWeakReferences();
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
HandlifyExternalTypedData();
HandlifyObjectsToReHash();
HandlifyExpandosToReHash();
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
HandlifyFromToObjects();
slow_forward_map.fill_cursor_ = fast_forward_map.fill_cursor_;
}
void MakeUninitializedNewSpaceObjectsGCSafe() {
auto& fast_forward_map = fast_object_copy_.fast_forward_map_;
const auto length = fast_forward_map.raw_from_to_.length();
const auto cursor = fast_forward_map.fill_cursor_;
for (intptr_t i = cursor; i < length; i += 2) {
auto from = fast_forward_map.raw_from_to_[i];
auto to = fast_forward_map.raw_from_to_[i + 1];
const uword tags = TagsFromUntaggedObject(from.untag());
const intptr_t cid = UntaggedObject::ClassIdTag::decode(tags);
// External typed data is already initialized.
if (!IsExternalTypedDataClassId(cid) && !IsTypedDataViewClassId(cid)) {
#if defined(DART_COMPRESSED_POINTERS)
const bool compressed = true;
#else
const bool compressed = false;
#endif
Object::InitializeObject(reinterpret_cast<uword>(to.untag()), cid,
from.untag()->HeapSize(), compressed);
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
UpdateLengthField(cid, from, to);
}
}
}
void HandlifyTransferables() {
Handlify(&fast_object_copy_.fast_forward_map_.raw_transferables_from_to_,
&slow_object_copy_.slow_forward_map_.transferables_from_to_);
}
void HandlifyWeakProperties() {
Handlify(&fast_object_copy_.fast_forward_map_.raw_weak_properties_,
&slow_object_copy_.slow_forward_map_.weak_properties_);
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
}
[vm] Implement `WeakReference` in the VM This CL implements `WeakReference` in the VM. * This reduces the size of weak references from 2 objects using 8 words to 1 object using 4 words. * This makes loads of weak reference targets a single load instead of two. * This avoids the fix-point in the GC and message object copying for weak references. (N.b. Weak references need to be processed _after_ the fix-point for weak properties.) The semantics of weak references in messages is that their target gets set to `null` if the target is not included in the message by a strong reference. The tests take particular care to exercise the case where a weak reference's target is only kept alive because a weak property key is alive and it refers to the target in its value. This exercises the fact that weak references need to be processed last. Does not add support for weak references in the app snapshot. It would be dead code until we start using weak references in for example the CFE. This CL does not try to unify weak references and weak properties in the GC or messaging (as proposed in go/dart-vm-weakreference), because their semantics differ enough. Closes: https://github.com/dart-lang/sdk/issues/48162 TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart TEST=runtime/vm/object_test.cc TEST=tests/lib/isolate/weak_reference_message_1_test.dart TEST=tests/lib/isolate/weak_reference_message_2_test.dart Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087 Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Martin Kustermann <kustermann@google.com> Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
void HandlifyWeakReferences() {
Handlify(&fast_object_copy_.fast_forward_map_.raw_weak_references_,
&slow_object_copy_.slow_forward_map_.weak_references_);
}
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
void HandlifyExternalTypedData() {
Handlify(&fast_object_copy_.fast_forward_map_.raw_external_typed_data_to_,
&slow_object_copy_.slow_forward_map_.external_typed_data_);
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
}
void HandlifyObjectsToReHash() {
Handlify(&fast_object_copy_.fast_forward_map_.raw_objects_to_rehash_,
&slow_object_copy_.slow_forward_map_.objects_to_rehash_);
}
void HandlifyExpandosToReHash() {
Handlify(&fast_object_copy_.fast_forward_map_.raw_expandos_to_rehash_,
&slow_object_copy_.slow_forward_map_.expandos_to_rehash_);
}
template <typename RawType, typename HandleType>
void Handlify(GrowableArray<RawType>* from,
GrowableArray<const HandleType*>* to) {
const auto length = from->length();
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
if (length > 0) {
to->Resize(length);
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
for (intptr_t i = 0; i < length; i++) {
(*to)[i] = &HandleType::Handle(Z, (*from)[i]);
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
}
from->Clear();
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
}
}
void HandlifyFromToObjects() {
auto& fast_forward_map = fast_object_copy_.fast_forward_map_;
auto& slow_forward_map = slow_object_copy_.slow_forward_map_;
const intptr_t cursor = fast_forward_map.fill_cursor_;
const intptr_t length = fast_forward_map.raw_from_to_.length();
slow_forward_map.from_to_.Resize(length);
for (intptr_t i = 2; i < length; i += 2) {
slow_forward_map.from_to_[i] =
i < cursor ? nullptr
: &Object::Handle(Z, fast_forward_map.raw_from_to_[i]);
slow_forward_map.from_to_[i + 1] =
&Object::Handle(Z, fast_forward_map.raw_from_to_[i + 1]);
}
fast_forward_map.raw_from_to_.Clear();
}
void ThrowException(const char* exception_msg) {
const auto& msg_obj = String::Handle(Z, String::New(exception_msg));
const auto& args = Array::Handle(Z, Array::New(1));
args.SetAt(0, msg_obj);
Exceptions::ThrowByType(Exceptions::kArgument, args);
UNREACHABLE();
}
Thread* thread_;
Zone* zone_;
FastObjectCopy fast_object_copy_;
SlowObjectCopy slow_object_copy_;
};
ObjectPtr CopyMutableObjectGraph(const Object& object) {
auto thread = Thread::Current();
ObjectGraphCopier copier(thread);
return copier.CopyObjectGraph(object);
}
} // namespace dart