dart-sdk/runtime/vm/object_graph_copy.h

31 lines
891 B
C
Raw Normal View History

[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
// Copyright (c) 2021, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.
#ifndef RUNTIME_VM_OBJECT_GRAPH_COPY_H_
#define RUNTIME_VM_OBJECT_GRAPH_COPY_H_
namespace dart {
class Object;
class ObjectPtr;
// Makes a transitive copy of the object graph referenced by [object]. Will not
// copy objects that can be safely shared - due to being immutable.
//
// The result will be an array of length 3 of the format
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
//
// [
// <message>,
// <collection-lib-objects-to-rehash>,
// <core-lib-objects-to-rehash>,
// ]
[vm/concurrency] Implement a fast transitive object copy for isolate message passing We use message passing as comunication mechanism between isolates. The transitive closure of an object to be sent is currently serialized into a snapshot form and deserialized on the receiver side. Furthermore the receiver side will re-hash any linked hashmaps in that graph. If isolate gropus are enabled we have all isolates in a group work on the same heap. That removes the need to use an intermediate serialization format. It also removes the need for an O(n) step on the receiver side. This CL implements a fast transitive object copy implementation and makes use of it a message that is to be passed to another isolate stays within the same isolate group. In the common case the object graph will fit into new space. So the copy algorithm will try to take advantage of it by having a fast path and a fallback path. Both of them effectively copy the graph in BFS order. The algorithm works effectively like a scavenge operation, but instead of first copying the from-object to the to-space and then re-writing the object in to-space to forward the pointers (which requires us writing to the to-space memory twice), we only reserve space for to-objects and then initialize the to-objects to it's final contents, including forwarded pointers (i.e. write the to-space object only once). Compared with a scavenge operation (which stores forwarding pointers in the objects themselves), we use a [WeakTable] to store them. This is the only remaining expensive part of the algorithm and could be further optimized. To avoid relying on iterating the to-space, we'll remember [from, to] addresses. => All of this works inside a [NoSafepointOperationScope] and avoids usages of handles as well as write barriers. While doing the transitive object copy, we'll share any object we can safely share (canonical objects, strings, sendports, ...) instead of copying it. If the fast path fails (due to allocation failure or hitting) we'll handlify any raw pointers and continue almost the same algorithm in a safe way, where GC is possible at every object allocation site and normal barriers are used for any stores of object pointers. The copy algorithm uses templates to share the copy logic between the fast and slow case (same copy routines can work on raw pointers as well as handles). There's a few special things to take into consideration: * If we copy a view on external typed data we need to know the external typed data address to compute the inner pointer of the view, so we'll eagerly initialize external typed data. * All external typed data needs to get a finalizer attached (irrespective if the object copy suceeds or not) to ensure the `malloc()`ed data is freed again. * Transferables will only be transferred on successful transitive copies. Also they need to attach finalizers to objects (which requires all objects be in handles). * We copy linked hashmaps as they are - instead of compressing the data by removing deleted entries. We may need to re-hash those hashmaps on the receiver side (similar to the snapshot-based copy approach) since new object graph will have no identity hash codes assigned to them. Though if the hashmaps only has sharable objects as keys (very common, e.g. json) there is no need for re-hashing. It changes the SendPort.* benchmarks as follows: ``` Benchmark | default | IG | IG + FOC ---------------------------------------------------------------------------------------------------------------------------- SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x) SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x) SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x) SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x) SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x) SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x) SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x) SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x) SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x) SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x) SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x) SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x) SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x) SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x) SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x) SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x) SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x) SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x) SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x) SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x) SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x) SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x) SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x) SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x) SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x) SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x) ``` Issue https://github.com/dart-lang/sdk/issues/36097 TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776 Commit-Queue: Martin Kustermann <kustermann@google.com> Reviewed-by: Ryan Macnak <rmacnak@google.com> Reviewed-by: Slava Egorov <vegorov@google.com> Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
//
// If the array of objects to rehash is not `null` the receiver should re-hash
// those objects.
ObjectPtr CopyMutableObjectGraph(const Object& root);
} // namespace dart
#endif // RUNTIME_VM_OBJECT_GRAPH_COPY_H_