[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
// Copyright (c) 2021, the Dart project authors. Please see the AUTHORS file
|
|
|
|
// for details. All rights reserved. Use of this source code is governed by a
|
|
|
|
// BSD-style license that can be found in the LICENSE file.
|
|
|
|
|
|
|
|
#include "vm/object_graph_copy.h"
|
2021-08-09 23:56:19 +00:00
|
|
|
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
#include "vm/dart_api_state.h"
|
|
|
|
#include "vm/flags.h"
|
|
|
|
#include "vm/heap/weak_table.h"
|
|
|
|
#include "vm/longjump.h"
|
|
|
|
#include "vm/object.h"
|
2021-08-09 23:56:19 +00:00
|
|
|
#include "vm/object_store.h"
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
#include "vm/snapshot.h"
|
|
|
|
#include "vm/symbols.h"
|
2022-05-20 08:16:41 +00:00
|
|
|
#include "vm/timeline.h"
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
|
|
|
|
#define Z zone_
|
|
|
|
|
|
|
|
// The list here contains two kinds of classes of objects
|
|
|
|
// * objects that will be shared and we will therefore never need to copy
|
|
|
|
// * objects that user object graphs should never reference
|
|
|
|
#define FOR_UNSUPPORTED_CLASSES(V) \
|
|
|
|
V(AbstractType) \
|
|
|
|
V(ApiError) \
|
|
|
|
V(Bool) \
|
|
|
|
V(CallSiteData) \
|
|
|
|
V(Capability) \
|
|
|
|
V(Class) \
|
|
|
|
V(ClosureData) \
|
|
|
|
V(Code) \
|
|
|
|
V(CodeSourceMap) \
|
|
|
|
V(CompressedStackMaps) \
|
|
|
|
V(ContextScope) \
|
|
|
|
V(DynamicLibrary) \
|
|
|
|
V(Error) \
|
|
|
|
V(ExceptionHandlers) \
|
|
|
|
V(FfiTrampolineData) \
|
|
|
|
V(Field) \
|
Reland "[vm] Implement `Finalizer`"
Original CL in patchset 1.
Split-off https://dart-review.googlesource.com/c/sdk/+/238341
And pulled in fix https://dart-review.googlesource.com/c/sdk/+/238582
(Should merge cleanly when this lands later.)
This CL implements the `Finalizer` in the GC.
The GC is specially aware of two types of objects for the purposes of
running finalizers.
1) `FinalizerEntry`
2) `Finalizer` (`FinalizerBase`, `_FinalizerImpl`)
A `FinalizerEntry` contains the `value`, the optional `detach` key, and
the `token`, and a reference to the `finalizer`.
An entry only holds on weakly to the value, detach key, and finalizer.
(Similar to how `WeakReference` only holds on weakly to target).
A `Finalizer` contains all entries, a list of entries of which the value
is collected, and a reference to the isolate.
When a the value of an entry is GCed, the enry is added over to the
collected list.
If any entry is moved to the collected list, a message is sent that
invokes the finalizer to call the callback on all entries in that list.
When a finalizer is detached by the user, the entry token is set to the
entry itself and is removed from the all entries set.
This ensures that if the entry was already moved to the collected list,
the finalizer is not executed.
To speed up detaching, we use a weak map from detach keys to list of
entries. This ensures entries can be GCed.
Both the scavenger and marker tasks process finalizer entries in
parallel.
Parallel tasks use an atomic exchange on the head of the collected
entries list, ensuring no entries get lost.
The mutator thread is guaranteed to be stopped when processing entries.
This ensures that we do not need barriers for moving entries into the
finalizers collected list.
Dart reads and replaces the collected entries list also with an atomic
exchange, ensuring the GC doesn't run in between a load/store.
When a finalizer gets posted a message to process finalized objects, it
is being kept alive by the message.
An alternative design would be to pre-allocate a `WeakReference` in the
finalizer pointing to the finalizer, and send that itself.
This would be at the cost of an extra object.
Send and exit is not supported in this CL, support will be added in a
follow up CL. Trying to send will throw.
Bug: https://github.com/dart-lang/sdk/issues/47777
TEST=runtime/tests/vm/dart/finalizer/*
TEST=runtime/tests/vm/dart_2/isolates/fast_object_copy_test.dart
TEST=runtime/vm/object_test.cc
Change-Id: Ibdfeadc16d5d69ade50aae5b9f794284c4c4dbab
Cq-Include-Trybots: luci.dart.try:vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-ffi-android-debug-arm64c-try,dart-sdk-mac-arm64-try,vm-kernel-mac-release-arm64-try,pkg-mac-release-arm64-try,vm-kernel-precomp-nnbd-mac-release-arm64-try,vm-kernel-win-debug-x64c-try,vm-kernel-win-debug-x64-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-nnbd-win-release-ia32-try,vm-ffi-android-debug-arm-try,vm-precomp-ffi-qemu-linux-release-arm-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-ia32-try,benchmark-linux-try,flutter-analyze-try,flutter-frontend-try,pkg-linux-debug-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-linux-debug-simarm_x64-try,vm-kernel-precomp-obfuscate-linux-release-x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/238086
Reviewed-by: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-03-25 10:29:30 +00:00
|
|
|
V(Finalizer) \
|
|
|
|
V(FinalizerBase) \
|
|
|
|
V(FinalizerEntry) \
|
[vm] Implement `NativeFinalizer`
This CL implements `NativeFinalizer` in the GC.
`FinalizerEntry`s are extended to track `external_size` and in which
`Heap::Space` the finalizable value is.
On attaching a native finalizer, the external size is added to the
relevant heap. When the finalizable value is promoted from new to old
space, the external size is promoted as well. And when a native
finalizer is run or is detached, the external size is removed from the
relevant heap again.
In contrast to Dart `Finalizer`s, `NativeFinalizer`s are run on isolate
shutdown.
When the `NativeFinalizer`s themselves are collected, the finalizers are
not run. Users should stick the native finalizer in a global variable to
ensure finalization. We will revisit this design when we add send and
exit support, because there is a design space to explore what to do in
that case. This current solution promises the least to users.
In this implementation native finalizers have a Dart entry to clean up
the entries from the `all_entries` field of the finalizer. We should
consider using another data structure that avoids the need for this Dart
entry. See the TODO left in the code.
Bug: https://github.com/dart-lang/sdk/issues/47777
TEST=runtime/tests/vm/dart(_2)/isolates/fast_object_copy_test.dart
TEST=runtime/vm/object_test.cc
TEST=tests/ffi(_2)/vmspecific_native_finalizer_*
Change-Id: I8f594c80c3c344ad83e1f2de10de028eb8456121
Cq-Include-Trybots: luci.dart.try:vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-ffi-android-debug-arm64c-try,dart-sdk-mac-arm64-try,vm-kernel-mac-release-arm64-try,pkg-mac-release-arm64-try,vm-kernel-precomp-nnbd-mac-release-arm64-try,vm-kernel-win-debug-x64c-try,vm-kernel-win-debug-x64-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-nnbd-win-release-ia32-try,vm-ffi-android-debug-arm-try,vm-precomp-ffi-qemu-linux-release-arm-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-ia32-try,benchmark-linux-try,flutter-frontend-try,pkg-linux-debug-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-linux-debug-simarm_x64-try,vm-kernel-precomp-obfuscate-linux-release-x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/236320
Reviewed-by: Martin Kustermann <kustermann@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-03-26 09:41:21 +00:00
|
|
|
V(NativeFinalizer) \
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
V(Function) \
|
|
|
|
V(FunctionType) \
|
|
|
|
V(FutureOr) \
|
|
|
|
V(ICData) \
|
|
|
|
V(Instance) \
|
|
|
|
V(Instructions) \
|
|
|
|
V(InstructionsSection) \
|
|
|
|
V(InstructionsTable) \
|
|
|
|
V(Int32x4) \
|
|
|
|
V(Integer) \
|
|
|
|
V(KernelProgramInfo) \
|
|
|
|
V(LanguageError) \
|
|
|
|
V(Library) \
|
|
|
|
V(LibraryPrefix) \
|
|
|
|
V(LoadingUnit) \
|
|
|
|
V(LocalVarDescriptors) \
|
|
|
|
V(MegamorphicCache) \
|
|
|
|
V(Mint) \
|
|
|
|
V(MirrorReference) \
|
|
|
|
V(MonomorphicSmiableCall) \
|
|
|
|
V(Namespace) \
|
|
|
|
V(Number) \
|
|
|
|
V(ObjectPool) \
|
|
|
|
V(PatchClass) \
|
|
|
|
V(PcDescriptors) \
|
|
|
|
V(Pointer) \
|
|
|
|
V(ReceivePort) \
|
|
|
|
V(RegExp) \
|
|
|
|
V(Script) \
|
|
|
|
V(Sentinel) \
|
|
|
|
V(SendPort) \
|
|
|
|
V(SingleTargetCache) \
|
|
|
|
V(Smi) \
|
|
|
|
V(StackTrace) \
|
|
|
|
V(SubtypeTestCache) \
|
2022-04-29 01:03:50 +00:00
|
|
|
V(SuspendState) \
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
V(Type) \
|
|
|
|
V(TypeArguments) \
|
|
|
|
V(TypeParameter) \
|
|
|
|
V(TypeParameters) \
|
|
|
|
V(TypeRef) \
|
|
|
|
V(TypedDataBase) \
|
|
|
|
V(UnhandledException) \
|
|
|
|
V(UnlinkedCall) \
|
|
|
|
V(UnwindError) \
|
|
|
|
V(UserTag) \
|
|
|
|
V(WeakSerializationReference)
|
|
|
|
|
|
|
|
namespace dart {
|
|
|
|
|
|
|
|
DEFINE_FLAG(bool,
|
|
|
|
enable_fast_object_copy,
|
|
|
|
true,
|
|
|
|
"Enable fast path for fast object copy.");
|
|
|
|
DEFINE_FLAG(bool,
|
|
|
|
gc_on_foc_slow_path,
|
|
|
|
false,
|
|
|
|
"Cause a GC when falling off the fast path for fast object copy.");
|
|
|
|
|
|
|
|
const char* kFastAllocationFailed = "fast allocation failed";
|
|
|
|
|
|
|
|
struct PtrTypes {
|
|
|
|
using Object = ObjectPtr;
|
|
|
|
static const dart::UntaggedObject* UntagObject(Object arg) {
|
|
|
|
return arg.untag();
|
|
|
|
}
|
|
|
|
static const dart::ObjectPtr GetObjectPtr(Object arg) { return arg; }
|
|
|
|
static const dart::Object& HandlifyObject(ObjectPtr arg) {
|
|
|
|
return dart::Object::Handle(arg);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define DO(V) \
|
|
|
|
using V = V##Ptr; \
|
|
|
|
static Untagged##V* Untag##V(V##Ptr arg) { return arg.untag(); } \
|
|
|
|
static V##Ptr Get##V##Ptr(V##Ptr arg) { return arg; } \
|
|
|
|
static V##Ptr Cast##V(ObjectPtr arg) { return dart::V::RawCast(arg); }
|
|
|
|
CLASS_LIST_FOR_HANDLES(DO)
|
|
|
|
#undef DO
|
|
|
|
};
|
|
|
|
|
|
|
|
struct HandleTypes {
|
2021-07-28 19:57:54 +00:00
|
|
|
using Object = const dart::Object&;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
static const dart::UntaggedObject* UntagObject(Object arg) {
|
|
|
|
return arg.ptr().untag();
|
|
|
|
}
|
|
|
|
static dart::ObjectPtr GetObjectPtr(Object arg) { return arg.ptr(); }
|
|
|
|
static Object HandlifyObject(Object arg) { return arg; }
|
|
|
|
|
|
|
|
#define DO(V) \
|
2021-07-28 19:57:54 +00:00
|
|
|
using V = const dart::V&; \
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
static Untagged##V* Untag##V(V arg) { return arg.ptr().untag(); } \
|
|
|
|
static V##Ptr Get##V##Ptr(V arg) { return arg.ptr(); } \
|
|
|
|
static V Cast##V(const dart::Object& arg) { return dart::V::Cast(arg); }
|
|
|
|
CLASS_LIST_FOR_HANDLES(DO)
|
|
|
|
#undef DO
|
|
|
|
};
|
|
|
|
|
|
|
|
DART_FORCE_INLINE
|
|
|
|
static ObjectPtr Marker() {
|
|
|
|
return Object::unknown_constant().ptr();
|
|
|
|
}
|
|
|
|
|
2021-10-11 09:01:44 +00:00
|
|
|
// Keep in sync with runtime/lib/isolate.cc:ValidateMessageObject
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
DART_FORCE_INLINE
|
2021-09-02 19:45:55 +00:00
|
|
|
static bool CanShareObject(ObjectPtr obj, uword tags) {
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
if ((tags & UntaggedObject::CanonicalBit::mask_in_place()) != 0) {
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
const auto cid = UntaggedObject::ClassIdTag::decode(tags);
|
|
|
|
if (cid == kOneByteStringCid) return true;
|
|
|
|
if (cid == kTwoByteStringCid) return true;
|
|
|
|
if (cid == kExternalOneByteStringCid) return true;
|
|
|
|
if (cid == kExternalTwoByteStringCid) return true;
|
|
|
|
if (cid == kMintCid) return true;
|
|
|
|
if (cid == kImmutableArrayCid) return true;
|
|
|
|
if (cid == kNeverCid) return true;
|
|
|
|
if (cid == kSentinelCid) return true;
|
2021-10-11 09:01:44 +00:00
|
|
|
if (cid == kStackTraceCid) return true;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
#if defined(DART_PRECOMPILED_RUNTIME)
|
|
|
|
// In JIT mode we have field guards enabled which means
|
|
|
|
// double/float32x4/float64x2 boxes can be mutable and we therefore cannot
|
|
|
|
// share them.
|
|
|
|
if (cid == kDoubleCid || cid == kFloat32x4Cid || cid == kFloat64x2Cid) {
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
if (cid == kInt32x4Cid) return true; // No field guards here.
|
|
|
|
if (cid == kSendPortCid) return true;
|
|
|
|
if (cid == kCapabilityCid) return true;
|
|
|
|
if (cid == kRegExpCid) return true;
|
|
|
|
|
2021-09-02 19:45:55 +00:00
|
|
|
if (cid == kClosureCid) {
|
|
|
|
// We can share a closure iff it doesn't close over any state.
|
|
|
|
return Closure::RawCast(obj)->untag()->context() == Object::null();
|
|
|
|
}
|
|
|
|
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Whether executing `get:hashCode` (possibly in a different isolate) on an
|
|
|
|
// object with the given [tags] might return a different answer than the source
|
|
|
|
// object (if copying is needed) or on the same object (if the object is
|
|
|
|
// shared).
|
|
|
|
DART_FORCE_INLINE
|
|
|
|
static bool MightNeedReHashing(ObjectPtr object) {
|
|
|
|
const uword tags = TagsFromUntaggedObject(object.untag());
|
|
|
|
const auto cid = UntaggedObject::ClassIdTag::decode(tags);
|
|
|
|
// These use structural hash codes and will therefore always result in the
|
|
|
|
// same hash codes.
|
|
|
|
if (cid == kOneByteStringCid) return false;
|
|
|
|
if (cid == kTwoByteStringCid) return false;
|
|
|
|
if (cid == kExternalOneByteStringCid) return false;
|
|
|
|
if (cid == kExternalTwoByteStringCid) return false;
|
|
|
|
if (cid == kMintCid) return false;
|
|
|
|
if (cid == kDoubleCid) return false;
|
2021-07-15 17:14:55 +00:00
|
|
|
if (cid == kBoolCid) return false;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
if (cid == kSendPortCid) return false;
|
|
|
|
if (cid == kCapabilityCid) return false;
|
2021-07-15 17:14:55 +00:00
|
|
|
if (cid == kNullCid) return false;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
|
|
|
|
// These are shared and use identity hash codes. If they are used as a key in
|
|
|
|
// a map or a value in a set, they will already have the identity hash code
|
|
|
|
// set.
|
|
|
|
if (cid == kImmutableArrayCid) return false;
|
|
|
|
if (cid == kRegExpCid) return false;
|
|
|
|
if (cid == kInt32x4Cid) return false;
|
|
|
|
|
|
|
|
// We copy those (instead of sharing them) - see [CanShareObjct]. They rely
|
|
|
|
// on the default hashCode implementation which uses identity hash codes
|
|
|
|
// (instead of structural hash code).
|
|
|
|
if (cid == kFloat32x4Cid || cid == kFloat64x2Cid) {
|
|
|
|
return !kDartPrecompiledRuntime;
|
|
|
|
}
|
|
|
|
|
|
|
|
// If the [tags] indicates this is a canonical object we'll share it instead
|
|
|
|
// of copying it. That would suggest we don't have to re-hash maps/sets
|
|
|
|
// containing this object on the receiver side.
|
|
|
|
//
|
|
|
|
// Though the object can be a constant of a user-defined class with a
|
|
|
|
// custom hash code that is misbehaving (e.g one that depends on global field
|
|
|
|
// state, ...). To be on the safe side we'll force re-hashing if such objects
|
|
|
|
// are encountered in maps/sets.
|
|
|
|
//
|
|
|
|
// => We might want to consider changing the implementation to avoid rehashing
|
|
|
|
// in such cases in the future and disambiguate the documentation.
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
DART_FORCE_INLINE
|
|
|
|
uword TagsFromUntaggedObject(UntaggedObject* obj) {
|
|
|
|
return obj->tags_;
|
|
|
|
}
|
|
|
|
|
|
|
|
DART_FORCE_INLINE
|
|
|
|
void SetNewSpaceTaggingWord(ObjectPtr to, classid_t cid, uint32_t size) {
|
|
|
|
uword tags = 0;
|
|
|
|
|
|
|
|
tags = UntaggedObject::SizeTag::update(size, tags);
|
|
|
|
tags = UntaggedObject::ClassIdTag::update(cid, tags);
|
|
|
|
tags = UntaggedObject::OldBit::update(false, tags);
|
|
|
|
tags = UntaggedObject::OldAndNotMarkedBit::update(false, tags);
|
|
|
|
tags = UntaggedObject::OldAndNotRememberedBit::update(false, tags);
|
|
|
|
tags = UntaggedObject::CanonicalBit::update(false, tags);
|
|
|
|
tags = UntaggedObject::NewBit::update(true, tags);
|
|
|
|
#if defined(HASH_IN_OBJECT_HEADER)
|
|
|
|
tags = UntaggedObject::HashTag::update(0, tags);
|
|
|
|
#endif
|
|
|
|
to.untag()->tags_ = tags;
|
|
|
|
}
|
|
|
|
|
|
|
|
DART_FORCE_INLINE
|
|
|
|
ObjectPtr AllocateObject(intptr_t cid, intptr_t size) {
|
|
|
|
#if defined(DART_COMPRESSED_POINTERS)
|
2021-09-15 22:32:04 +00:00
|
|
|
const bool compressed = true;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
#else
|
|
|
|
const bool compressed = false;
|
|
|
|
#endif
|
|
|
|
return Object::Allocate(cid, size, Heap::kNew, compressed);
|
|
|
|
}
|
|
|
|
|
|
|
|
DART_FORCE_INLINE
|
|
|
|
void UpdateLengthField(intptr_t cid, ObjectPtr from, ObjectPtr to) {
|
|
|
|
// We share these objects - never copy them.
|
|
|
|
ASSERT(!IsStringClassId(cid));
|
|
|
|
ASSERT(cid != kImmutableArrayCid);
|
|
|
|
|
|
|
|
// We update any in-heap variable sized object with the length to keep the
|
|
|
|
// length and the size in the object header in-sync for the GC.
|
|
|
|
if (cid == kArrayCid) {
|
|
|
|
static_cast<UntaggedArray*>(to.untag())->length_ =
|
|
|
|
static_cast<UntaggedArray*>(from.untag())->length_;
|
2021-09-02 19:45:55 +00:00
|
|
|
} else if (cid == kContextCid) {
|
|
|
|
static_cast<UntaggedContext*>(to.untag())->num_variables_ =
|
|
|
|
static_cast<UntaggedContext*>(from.untag())->num_variables_;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
} else if (IsTypedDataClassId(cid)) {
|
|
|
|
static_cast<UntaggedTypedDataBase*>(to.untag())->length_ =
|
|
|
|
static_cast<UntaggedTypedDataBase*>(from.untag())->length_;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void InitializeExternalTypedData(intptr_t cid,
|
|
|
|
ExternalTypedDataPtr from,
|
|
|
|
ExternalTypedDataPtr to) {
|
|
|
|
auto raw_from = from.untag();
|
|
|
|
auto raw_to = to.untag();
|
|
|
|
const intptr_t length =
|
|
|
|
TypedData::ElementSizeInBytes(cid) * Smi::Value(raw_from->length_);
|
|
|
|
|
|
|
|
auto buffer = static_cast<uint8_t*>(malloc(length));
|
|
|
|
memmove(buffer, raw_from->data_, length);
|
|
|
|
raw_to->length_ = raw_from->length_;
|
|
|
|
raw_to->data_ = buffer;
|
|
|
|
}
|
|
|
|
|
2022-05-19 12:22:24 +00:00
|
|
|
template <typename T>
|
|
|
|
void CopyTypedDataBaseWithSafepointChecks(Thread* thread,
|
|
|
|
const T& from,
|
|
|
|
const T& to,
|
|
|
|
intptr_t length) {
|
|
|
|
constexpr intptr_t kChunkSize = 100 * 1024;
|
|
|
|
|
|
|
|
const intptr_t chunks = length / kChunkSize;
|
|
|
|
const intptr_t remainder = length % kChunkSize;
|
|
|
|
|
|
|
|
// Notice we re-load the data pointer, since T may be TypedData in which case
|
|
|
|
// the interior pointer may change after checking into safepoints.
|
|
|
|
for (intptr_t i = 0; i < chunks; ++i) {
|
|
|
|
memmove(to.ptr().untag()->data_ + i * kChunkSize,
|
|
|
|
from.ptr().untag()->data_ + i * kChunkSize, kChunkSize);
|
|
|
|
|
|
|
|
thread->CheckForSafepoint();
|
|
|
|
}
|
|
|
|
if (remainder > 0) {
|
|
|
|
memmove(to.ptr().untag()->data_ + chunks * kChunkSize,
|
|
|
|
from.ptr().untag()->data_ + chunks * kChunkSize, remainder);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void InitializeExternalTypedDataWithSafepointChecks(
|
|
|
|
Thread* thread,
|
|
|
|
intptr_t cid,
|
|
|
|
const ExternalTypedData& from,
|
|
|
|
const ExternalTypedData& to) {
|
|
|
|
const intptr_t length_in_elements = from.Length();
|
|
|
|
const intptr_t length_in_bytes =
|
|
|
|
TypedData::ElementSizeInBytes(cid) * length_in_elements;
|
|
|
|
|
|
|
|
uint8_t* to_data = static_cast<uint8_t*>(malloc(length_in_bytes));
|
|
|
|
to.ptr().untag()->data_ = to_data;
|
|
|
|
to.ptr().untag()->length_ = Smi::New(length_in_elements);
|
|
|
|
|
|
|
|
CopyTypedDataBaseWithSafepointChecks(thread, from, to, length_in_bytes);
|
|
|
|
}
|
|
|
|
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
void InitializeTypedDataView(TypedDataViewPtr obj) {
|
|
|
|
obj.untag()->typed_data_ = TypedDataBase::null();
|
|
|
|
obj.untag()->offset_in_bytes_ = 0;
|
|
|
|
obj.untag()->length_ = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
void FreeExternalTypedData(void* isolate_callback_data, void* buffer) {
|
|
|
|
free(buffer);
|
|
|
|
}
|
|
|
|
|
|
|
|
void FreeTransferablePeer(void* isolate_callback_data, void* peer) {
|
|
|
|
delete static_cast<TransferableTypedDataPeer*>(peer);
|
|
|
|
}
|
|
|
|
|
|
|
|
class ForwardMapBase {
|
|
|
|
public:
|
|
|
|
explicit ForwardMapBase(Thread* thread)
|
|
|
|
: thread_(thread), zone_(thread->zone()), isolate_(thread->isolate()) {}
|
|
|
|
|
|
|
|
protected:
|
|
|
|
friend class ObjectGraphCopier;
|
|
|
|
|
|
|
|
intptr_t GetObjectId(ObjectPtr object) {
|
|
|
|
if (object->IsNewObject()) {
|
|
|
|
return isolate_->forward_table_new()->GetValueExclusive(object);
|
|
|
|
} else {
|
|
|
|
return isolate_->forward_table_old()->GetValueExclusive(object);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
void SetObjectId(ObjectPtr object, intptr_t id) {
|
|
|
|
if (object->IsNewObject()) {
|
|
|
|
isolate_->forward_table_new()->SetValueExclusive(object, id);
|
|
|
|
} else {
|
|
|
|
isolate_->forward_table_old()->SetValueExclusive(object, id);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void FinalizeTransferable(const TransferableTypedData& from,
|
|
|
|
const TransferableTypedData& to) {
|
|
|
|
// Get the old peer.
|
|
|
|
auto fpeer = static_cast<TransferableTypedDataPeer*>(
|
|
|
|
thread_->heap()->GetPeer(from.ptr()));
|
|
|
|
ASSERT(fpeer != nullptr && fpeer->data() != nullptr);
|
|
|
|
const intptr_t length = fpeer->length();
|
|
|
|
|
|
|
|
// Allocate new peer object with (data, length).
|
|
|
|
auto tpeer = new TransferableTypedDataPeer(fpeer->data(), length);
|
|
|
|
thread_->heap()->SetPeer(to.ptr(), tpeer);
|
|
|
|
|
|
|
|
// Move the handle itself to the new object.
|
|
|
|
fpeer->handle()->EnsureFreedExternal(thread_->isolate_group());
|
|
|
|
tpeer->set_handle(FinalizablePersistentHandle::New(
|
|
|
|
thread_->isolate_group(), to, tpeer, FreeTransferablePeer, length,
|
|
|
|
/*auto_delete=*/true));
|
|
|
|
fpeer->ClearData();
|
|
|
|
}
|
|
|
|
|
|
|
|
void FinalizeExternalTypedData(const ExternalTypedData& to) {
|
|
|
|
to.AddFinalizer(to.DataAddr(0), &FreeExternalTypedData, to.LengthInBytes());
|
|
|
|
}
|
|
|
|
|
|
|
|
Thread* thread_;
|
|
|
|
Zone* zone_;
|
|
|
|
Isolate* isolate_;
|
|
|
|
|
|
|
|
private:
|
|
|
|
DISALLOW_COPY_AND_ASSIGN(ForwardMapBase);
|
|
|
|
};
|
|
|
|
|
|
|
|
class FastForwardMap : public ForwardMapBase {
|
|
|
|
public:
|
|
|
|
explicit FastForwardMap(Thread* thread)
|
|
|
|
: ForwardMapBase(thread),
|
|
|
|
raw_from_to_(thread->zone(), 20),
|
|
|
|
raw_transferables_from_to_(thread->zone(), 0),
|
2021-08-09 23:56:19 +00:00
|
|
|
raw_objects_to_rehash_(thread->zone(), 0),
|
|
|
|
raw_expandos_to_rehash_(thread->zone(), 0) {
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
raw_from_to_.Resize(2);
|
|
|
|
raw_from_to_[0] = Object::null();
|
|
|
|
raw_from_to_[1] = Object::null();
|
|
|
|
fill_cursor_ = 2;
|
|
|
|
}
|
|
|
|
|
|
|
|
ObjectPtr ForwardedObject(ObjectPtr object) {
|
|
|
|
const intptr_t id = GetObjectId(object);
|
|
|
|
if (id == 0) return Marker();
|
|
|
|
return raw_from_to_[id + 1];
|
|
|
|
}
|
|
|
|
|
2022-05-20 08:16:41 +00:00
|
|
|
void Insert(ObjectPtr from, ObjectPtr to, intptr_t size) {
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
ASSERT(ForwardedObject(from) == Marker());
|
|
|
|
ASSERT(raw_from_to_.length() == raw_from_to_.length());
|
|
|
|
const auto id = raw_from_to_.length();
|
|
|
|
SetObjectId(from, id);
|
|
|
|
raw_from_to_.Resize(id + 2);
|
|
|
|
raw_from_to_[id] = from;
|
|
|
|
raw_from_to_[id + 1] = to;
|
2022-05-20 08:16:41 +00:00
|
|
|
allocated_bytes += size;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void AddTransferable(TransferableTypedDataPtr from,
|
|
|
|
TransferableTypedDataPtr to) {
|
|
|
|
raw_transferables_from_to_.Add(from);
|
|
|
|
raw_transferables_from_to_.Add(to);
|
|
|
|
}
|
2021-08-09 23:56:19 +00:00
|
|
|
void AddWeakProperty(WeakPropertyPtr from) { raw_weak_properties_.Add(from); }
|
[vm] Implement `WeakReference` in the VM
This CL implements `WeakReference` in the VM.
* This reduces the size of weak references from 2 objects using 8 words
to 1 object using 4 words.
* This makes loads of weak reference targets a single load instead of
two.
* This avoids the fix-point in the GC and message object copying for
weak references. (N.b. Weak references need to be processed _after_
the fix-point for weak properties.)
The semantics of weak references in messages is that their target gets
set to `null` if the target is not included in the message by a strong
reference.
The tests take particular care to exercise the case where a weak
reference's target is only kept alive because a weak property key is
alive and it refers to the target in its value. This exercises the fact
that weak references need to be processed last.
Does not add support for weak references in the app snapshot. It would
be dead code until we start using weak references in for example the
CFE.
This CL does not try to unify weak references and weak properties in
the GC or messaging (as proposed in go/dart-vm-weakreference), because
their semantics differ enough.
Closes: https://github.com/dart-lang/sdk/issues/48162
TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart
TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart
TEST=runtime/vm/object_test.cc
TEST=tests/lib/isolate/weak_reference_message_1_test.dart
TEST=tests/lib/isolate/weak_reference_message_2_test.dart
Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c
Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
|
|
|
void AddWeakReference(WeakReferencePtr from) {
|
|
|
|
raw_weak_references_.Add(from);
|
|
|
|
}
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
void AddExternalTypedData(ExternalTypedDataPtr to) {
|
|
|
|
raw_external_typed_data_to_.Add(to);
|
|
|
|
}
|
|
|
|
|
|
|
|
void AddObjectToRehash(ObjectPtr to) { raw_objects_to_rehash_.Add(to); }
|
2021-08-09 23:56:19 +00:00
|
|
|
void AddExpandoToRehash(ObjectPtr to) { raw_expandos_to_rehash_.Add(to); }
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
|
|
|
|
private:
|
|
|
|
friend class FastObjectCopy;
|
|
|
|
friend class ObjectGraphCopier;
|
|
|
|
|
|
|
|
GrowableArray<ObjectPtr> raw_from_to_;
|
|
|
|
GrowableArray<TransferableTypedDataPtr> raw_transferables_from_to_;
|
|
|
|
GrowableArray<ExternalTypedDataPtr> raw_external_typed_data_to_;
|
|
|
|
GrowableArray<ObjectPtr> raw_objects_to_rehash_;
|
2021-08-09 23:56:19 +00:00
|
|
|
GrowableArray<ObjectPtr> raw_expandos_to_rehash_;
|
|
|
|
GrowableArray<WeakPropertyPtr> raw_weak_properties_;
|
[vm] Implement `WeakReference` in the VM
This CL implements `WeakReference` in the VM.
* This reduces the size of weak references from 2 objects using 8 words
to 1 object using 4 words.
* This makes loads of weak reference targets a single load instead of
two.
* This avoids the fix-point in the GC and message object copying for
weak references. (N.b. Weak references need to be processed _after_
the fix-point for weak properties.)
The semantics of weak references in messages is that their target gets
set to `null` if the target is not included in the message by a strong
reference.
The tests take particular care to exercise the case where a weak
reference's target is only kept alive because a weak property key is
alive and it refers to the target in its value. This exercises the fact
that weak references need to be processed last.
Does not add support for weak references in the app snapshot. It would
be dead code until we start using weak references in for example the
CFE.
This CL does not try to unify weak references and weak properties in
the GC or messaging (as proposed in go/dart-vm-weakreference), because
their semantics differ enough.
Closes: https://github.com/dart-lang/sdk/issues/48162
TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart
TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart
TEST=runtime/vm/object_test.cc
TEST=tests/lib/isolate/weak_reference_message_1_test.dart
TEST=tests/lib/isolate/weak_reference_message_2_test.dart
Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c
Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
|
|
|
GrowableArray<WeakReferencePtr> raw_weak_references_;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
intptr_t fill_cursor_ = 0;
|
2022-05-20 08:16:41 +00:00
|
|
|
intptr_t allocated_bytes = 0;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
|
|
|
|
DISALLOW_COPY_AND_ASSIGN(FastForwardMap);
|
|
|
|
};
|
|
|
|
|
|
|
|
class SlowForwardMap : public ForwardMapBase {
|
|
|
|
public:
|
|
|
|
explicit SlowForwardMap(Thread* thread)
|
|
|
|
: ForwardMapBase(thread),
|
2022-06-01 17:52:28 +00:00
|
|
|
from_to_transition_(thread->zone(), 2),
|
|
|
|
from_to_(GrowableObjectArray::Handle(thread->zone(),
|
|
|
|
GrowableObjectArray::New(2))),
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
transferables_from_to_(thread->zone(), 0) {
|
2022-06-01 17:52:28 +00:00
|
|
|
from_to_transition_.Resize(2);
|
|
|
|
from_to_transition_[0] = &PassiveObject::Handle();
|
|
|
|
from_to_transition_[1] = &PassiveObject::Handle();
|
|
|
|
from_to_.Add(Object::null_object());
|
|
|
|
from_to_.Add(Object::null_object());
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
fill_cursor_ = 2;
|
|
|
|
}
|
|
|
|
|
|
|
|
ObjectPtr ForwardedObject(ObjectPtr object) {
|
|
|
|
const intptr_t id = GetObjectId(object);
|
|
|
|
if (id == 0) return Marker();
|
2022-06-01 17:52:28 +00:00
|
|
|
return from_to_.At(id + 1);
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
|
|
|
|
2022-06-01 17:52:28 +00:00
|
|
|
void Insert(const Object& from, const Object& to, intptr_t size) {
|
|
|
|
ASSERT(ForwardedObject(from.ptr()) == Marker());
|
|
|
|
const auto id = from_to_.Length();
|
|
|
|
SetObjectId(from.ptr(), id);
|
|
|
|
from_to_.Add(from);
|
|
|
|
from_to_.Add(to);
|
2022-05-20 08:16:41 +00:00
|
|
|
allocated_bytes += size;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void AddTransferable(const TransferableTypedData& from,
|
|
|
|
const TransferableTypedData& to) {
|
|
|
|
transferables_from_to_.Add(&TransferableTypedData::Handle(from.ptr()));
|
|
|
|
transferables_from_to_.Add(&TransferableTypedData::Handle(to.ptr()));
|
|
|
|
}
|
2021-08-09 23:56:19 +00:00
|
|
|
void AddWeakProperty(const WeakProperty& from) {
|
|
|
|
weak_properties_.Add(&WeakProperty::Handle(from.ptr()));
|
|
|
|
}
|
[vm] Implement `WeakReference` in the VM
This CL implements `WeakReference` in the VM.
* This reduces the size of weak references from 2 objects using 8 words
to 1 object using 4 words.
* This makes loads of weak reference targets a single load instead of
two.
* This avoids the fix-point in the GC and message object copying for
weak references. (N.b. Weak references need to be processed _after_
the fix-point for weak properties.)
The semantics of weak references in messages is that their target gets
set to `null` if the target is not included in the message by a strong
reference.
The tests take particular care to exercise the case where a weak
reference's target is only kept alive because a weak property key is
alive and it refers to the target in its value. This exercises the fact
that weak references need to be processed last.
Does not add support for weak references in the app snapshot. It would
be dead code until we start using weak references in for example the
CFE.
This CL does not try to unify weak references and weak properties in
the GC or messaging (as proposed in go/dart-vm-weakreference), because
their semantics differ enough.
Closes: https://github.com/dart-lang/sdk/issues/48162
TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart
TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart
TEST=runtime/vm/object_test.cc
TEST=tests/lib/isolate/weak_reference_message_1_test.dart
TEST=tests/lib/isolate/weak_reference_message_2_test.dart
Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c
Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
|
|
|
void AddWeakReference(const WeakReference& from) {
|
|
|
|
weak_references_.Add(&WeakReference::Handle(from.ptr()));
|
|
|
|
}
|
2022-05-19 12:22:24 +00:00
|
|
|
const ExternalTypedData& AddExternalTypedData(ExternalTypedDataPtr to) {
|
|
|
|
auto to_handle = &ExternalTypedData::Handle(to);
|
|
|
|
external_typed_data_.Add(to_handle);
|
|
|
|
return *to_handle;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
|
|
|
void AddObjectToRehash(const Object& to) {
|
|
|
|
objects_to_rehash_.Add(&Object::Handle(to.ptr()));
|
|
|
|
}
|
2021-08-09 23:56:19 +00:00
|
|
|
void AddExpandoToRehash(const Object& to) {
|
|
|
|
expandos_to_rehash_.Add(&Object::Handle(to.ptr()));
|
|
|
|
}
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
|
|
|
|
void FinalizeTransferables() {
|
|
|
|
for (intptr_t i = 0; i < transferables_from_to_.length(); i += 2) {
|
|
|
|
auto from = transferables_from_to_[i];
|
|
|
|
auto to = transferables_from_to_[i + 1];
|
|
|
|
FinalizeTransferable(*from, *to);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void FinalizeExternalTypedData() {
|
|
|
|
for (intptr_t i = 0; i < external_typed_data_.length(); i++) {
|
|
|
|
auto to = external_typed_data_[i];
|
|
|
|
ForwardMapBase::FinalizeExternalTypedData(*to);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
private:
|
|
|
|
friend class SlowObjectCopy;
|
|
|
|
friend class ObjectGraphCopier;
|
|
|
|
|
2022-06-01 17:52:28 +00:00
|
|
|
GrowableArray<const PassiveObject*> from_to_transition_;
|
|
|
|
GrowableObjectArray& from_to_;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
GrowableArray<const TransferableTypedData*> transferables_from_to_;
|
|
|
|
GrowableArray<const ExternalTypedData*> external_typed_data_;
|
|
|
|
GrowableArray<const Object*> objects_to_rehash_;
|
2021-08-09 23:56:19 +00:00
|
|
|
GrowableArray<const Object*> expandos_to_rehash_;
|
|
|
|
GrowableArray<const WeakProperty*> weak_properties_;
|
[vm] Implement `WeakReference` in the VM
This CL implements `WeakReference` in the VM.
* This reduces the size of weak references from 2 objects using 8 words
to 1 object using 4 words.
* This makes loads of weak reference targets a single load instead of
two.
* This avoids the fix-point in the GC and message object copying for
weak references. (N.b. Weak references need to be processed _after_
the fix-point for weak properties.)
The semantics of weak references in messages is that their target gets
set to `null` if the target is not included in the message by a strong
reference.
The tests take particular care to exercise the case where a weak
reference's target is only kept alive because a weak property key is
alive and it refers to the target in its value. This exercises the fact
that weak references need to be processed last.
Does not add support for weak references in the app snapshot. It would
be dead code until we start using weak references in for example the
CFE.
This CL does not try to unify weak references and weak properties in
the GC or messaging (as proposed in go/dart-vm-weakreference), because
their semantics differ enough.
Closes: https://github.com/dart-lang/sdk/issues/48162
TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart
TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart
TEST=runtime/vm/object_test.cc
TEST=tests/lib/isolate/weak_reference_message_1_test.dart
TEST=tests/lib/isolate/weak_reference_message_2_test.dart
Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c
Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
|
|
|
GrowableArray<const WeakReference*> weak_references_;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
intptr_t fill_cursor_ = 0;
|
2022-05-20 08:16:41 +00:00
|
|
|
intptr_t allocated_bytes = 0;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
|
|
|
|
DISALLOW_COPY_AND_ASSIGN(SlowForwardMap);
|
|
|
|
};
|
|
|
|
|
|
|
|
class ObjectCopyBase {
|
|
|
|
public:
|
|
|
|
explicit ObjectCopyBase(Thread* thread)
|
|
|
|
: thread_(thread),
|
|
|
|
heap_base_(thread->heap_base()),
|
|
|
|
zone_(thread->zone()),
|
|
|
|
heap_(thread->isolate_group()->heap()),
|
|
|
|
class_table_(thread->isolate_group()->class_table()),
|
|
|
|
new_space_(heap_->new_space()),
|
2021-08-09 23:56:19 +00:00
|
|
|
tmp_(Object::Handle(thread->zone())),
|
2022-06-01 17:52:28 +00:00
|
|
|
to_(Object::Handle(thread->zone())),
|
2021-08-09 23:56:19 +00:00
|
|
|
expando_cid_(Class::GetClassId(
|
|
|
|
thread->isolate_group()->object_store()->expando_class())) {}
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
~ObjectCopyBase() {}
|
|
|
|
|
|
|
|
protected:
|
|
|
|
static ObjectPtr LoadPointer(ObjectPtr src, intptr_t offset) {
|
|
|
|
return src.untag()->LoadPointer(reinterpret_cast<ObjectPtr*>(
|
|
|
|
reinterpret_cast<uint8_t*>(src.untag()) + offset));
|
|
|
|
}
|
|
|
|
static CompressedObjectPtr LoadCompressedPointer(ObjectPtr src,
|
|
|
|
intptr_t offset) {
|
|
|
|
return src.untag()->LoadPointer(reinterpret_cast<CompressedObjectPtr*>(
|
|
|
|
reinterpret_cast<uint8_t*>(src.untag()) + offset));
|
|
|
|
}
|
|
|
|
static compressed_uword LoadCompressedNonPointerWord(ObjectPtr src,
|
|
|
|
intptr_t offset) {
|
|
|
|
return *reinterpret_cast<compressed_uword*>(
|
|
|
|
reinterpret_cast<uint8_t*>(src.untag()) + offset);
|
|
|
|
}
|
|
|
|
static void StorePointerBarrier(ObjectPtr obj,
|
|
|
|
intptr_t offset,
|
|
|
|
ObjectPtr value) {
|
|
|
|
obj.untag()->StorePointer(
|
|
|
|
reinterpret_cast<ObjectPtr*>(reinterpret_cast<uint8_t*>(obj.untag()) +
|
|
|
|
offset),
|
|
|
|
value);
|
|
|
|
}
|
|
|
|
static void StoreCompressedPointerBarrier(ObjectPtr obj,
|
|
|
|
intptr_t offset,
|
|
|
|
ObjectPtr value) {
|
|
|
|
obj.untag()->StoreCompressedPointer(
|
|
|
|
reinterpret_cast<CompressedObjectPtr*>(
|
|
|
|
reinterpret_cast<uint8_t*>(obj.untag()) + offset),
|
|
|
|
value);
|
|
|
|
}
|
|
|
|
void StoreCompressedLargeArrayPointerBarrier(ObjectPtr obj,
|
|
|
|
intptr_t offset,
|
|
|
|
ObjectPtr value) {
|
|
|
|
obj.untag()->StoreCompressedArrayPointer(
|
|
|
|
reinterpret_cast<CompressedObjectPtr*>(
|
|
|
|
reinterpret_cast<uint8_t*>(obj.untag()) + offset),
|
|
|
|
value, thread_);
|
|
|
|
}
|
|
|
|
static void StorePointerNoBarrier(ObjectPtr obj,
|
|
|
|
intptr_t offset,
|
|
|
|
ObjectPtr value) {
|
|
|
|
*reinterpret_cast<ObjectPtr*>(reinterpret_cast<uint8_t*>(obj.untag()) +
|
|
|
|
offset) = value;
|
|
|
|
}
|
|
|
|
template <typename T = ObjectPtr>
|
|
|
|
static void StoreCompressedPointerNoBarrier(ObjectPtr obj,
|
|
|
|
intptr_t offset,
|
|
|
|
T value) {
|
|
|
|
*reinterpret_cast<CompressedObjectPtr*>(
|
|
|
|
reinterpret_cast<uint8_t*>(obj.untag()) + offset) = value;
|
|
|
|
}
|
|
|
|
static void StoreCompressedNonPointerWord(ObjectPtr obj,
|
|
|
|
intptr_t offset,
|
|
|
|
compressed_uword value) {
|
|
|
|
*reinterpret_cast<compressed_uword*>(
|
|
|
|
reinterpret_cast<uint8_t*>(obj.untag()) + offset) = value;
|
|
|
|
}
|
|
|
|
|
|
|
|
DART_FORCE_INLINE
|
|
|
|
bool CanCopyObject(uword tags, ObjectPtr object) {
|
|
|
|
const auto cid = UntaggedObject::ClassIdTag::decode(tags);
|
|
|
|
if (cid > kNumPredefinedCids) {
|
|
|
|
const bool has_native_fields =
|
|
|
|
Class::NumNativeFieldsOf(class_table_->At(cid)) != 0;
|
|
|
|
if (has_native_fields) {
|
|
|
|
exception_msg_ =
|
2021-08-12 20:25:30 +00:00
|
|
|
OS::SCreate(zone_,
|
|
|
|
"Illegal argument in isolate message: (object extends "
|
|
|
|
"NativeWrapper - %s)",
|
|
|
|
Class::Handle(class_table_->At(cid)).ToCString());
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
return false;
|
|
|
|
}
|
[vm] Implement `NativeFinalizer`
This CL implements `NativeFinalizer` in the GC.
`FinalizerEntry`s are extended to track `external_size` and in which
`Heap::Space` the finalizable value is.
On attaching a native finalizer, the external size is added to the
relevant heap. When the finalizable value is promoted from new to old
space, the external size is promoted as well. And when a native
finalizer is run or is detached, the external size is removed from the
relevant heap again.
In contrast to Dart `Finalizer`s, `NativeFinalizer`s are run on isolate
shutdown.
When the `NativeFinalizer`s themselves are collected, the finalizers are
not run. Users should stick the native finalizer in a global variable to
ensure finalization. We will revisit this design when we add send and
exit support, because there is a design space to explore what to do in
that case. This current solution promises the least to users.
In this implementation native finalizers have a Dart entry to clean up
the entries from the `all_entries` field of the finalizer. We should
consider using another data structure that avoids the need for this Dart
entry. See the TODO left in the code.
Bug: https://github.com/dart-lang/sdk/issues/47777
TEST=runtime/tests/vm/dart(_2)/isolates/fast_object_copy_test.dart
TEST=runtime/vm/object_test.cc
TEST=tests/ffi(_2)/vmspecific_native_finalizer_*
Change-Id: I8f594c80c3c344ad83e1f2de10de028eb8456121
Cq-Include-Trybots: luci.dart.try:vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-ffi-android-debug-arm64c-try,dart-sdk-mac-arm64-try,vm-kernel-mac-release-arm64-try,pkg-mac-release-arm64-try,vm-kernel-precomp-nnbd-mac-release-arm64-try,vm-kernel-win-debug-x64c-try,vm-kernel-win-debug-x64-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-nnbd-win-release-ia32-try,vm-ffi-android-debug-arm-try,vm-precomp-ffi-qemu-linux-release-arm-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-ia32-try,benchmark-linux-try,flutter-frontend-try,pkg-linux-debug-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-linux-debug-simarm_x64-try,vm-kernel-precomp-obfuscate-linux-release-x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/236320
Reviewed-by: Martin Kustermann <kustermann@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-03-26 09:41:21 +00:00
|
|
|
const bool implements_finalizable =
|
|
|
|
Class::ImplementsFinalizable(class_table_->At(cid));
|
|
|
|
if (implements_finalizable) {
|
|
|
|
exception_msg_ = OS::SCreate(
|
|
|
|
zone_,
|
|
|
|
"Illegal argument in isolate message: (object implements "
|
|
|
|
"Finalizable - %s)",
|
|
|
|
Class::Handle(class_table_->At(cid)).ToCString());
|
|
|
|
return false;
|
|
|
|
}
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
return true;
|
|
|
|
}
|
|
|
|
#define HANDLE_ILLEGAL_CASE(Type) \
|
|
|
|
case k##Type##Cid: { \
|
|
|
|
exception_msg_ = \
|
|
|
|
"Illegal argument in isolate message: " \
|
[vm] Implement `NativeFinalizer`
This CL implements `NativeFinalizer` in the GC.
`FinalizerEntry`s are extended to track `external_size` and in which
`Heap::Space` the finalizable value is.
On attaching a native finalizer, the external size is added to the
relevant heap. When the finalizable value is promoted from new to old
space, the external size is promoted as well. And when a native
finalizer is run or is detached, the external size is removed from the
relevant heap again.
In contrast to Dart `Finalizer`s, `NativeFinalizer`s are run on isolate
shutdown.
When the `NativeFinalizer`s themselves are collected, the finalizers are
not run. Users should stick the native finalizer in a global variable to
ensure finalization. We will revisit this design when we add send and
exit support, because there is a design space to explore what to do in
that case. This current solution promises the least to users.
In this implementation native finalizers have a Dart entry to clean up
the entries from the `all_entries` field of the finalizer. We should
consider using another data structure that avoids the need for this Dart
entry. See the TODO left in the code.
Bug: https://github.com/dart-lang/sdk/issues/47777
TEST=runtime/tests/vm/dart(_2)/isolates/fast_object_copy_test.dart
TEST=runtime/vm/object_test.cc
TEST=tests/ffi(_2)/vmspecific_native_finalizer_*
Change-Id: I8f594c80c3c344ad83e1f2de10de028eb8456121
Cq-Include-Trybots: luci.dart.try:vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-ffi-android-debug-arm64c-try,dart-sdk-mac-arm64-try,vm-kernel-mac-release-arm64-try,pkg-mac-release-arm64-try,vm-kernel-precomp-nnbd-mac-release-arm64-try,vm-kernel-win-debug-x64c-try,vm-kernel-win-debug-x64-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-nnbd-win-release-ia32-try,vm-ffi-android-debug-arm-try,vm-precomp-ffi-qemu-linux-release-arm-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-ia32-try,benchmark-linux-try,flutter-frontend-try,pkg-linux-debug-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-linux-debug-simarm_x64-try,vm-kernel-precomp-obfuscate-linux-release-x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/236320
Reviewed-by: Martin Kustermann <kustermann@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-03-26 09:41:21 +00:00
|
|
|
"(object is a " #Type ")"; \
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
return false; \
|
|
|
|
}
|
|
|
|
|
|
|
|
switch (cid) {
|
2021-09-02 19:45:55 +00:00
|
|
|
// From "dart:ffi" we handle only Pointer/DynamicLibrary specially, since
|
|
|
|
// those are the only non-abstract classes (so we avoid checking more cids
|
|
|
|
// here that cannot happen in reality)
|
2021-08-12 20:25:30 +00:00
|
|
|
HANDLE_ILLEGAL_CASE(DynamicLibrary)
|
Reland "[vm] Implement `Finalizer`"
Original CL in patchset 1.
Split-off https://dart-review.googlesource.com/c/sdk/+/238341
And pulled in fix https://dart-review.googlesource.com/c/sdk/+/238582
(Should merge cleanly when this lands later.)
This CL implements the `Finalizer` in the GC.
The GC is specially aware of two types of objects for the purposes of
running finalizers.
1) `FinalizerEntry`
2) `Finalizer` (`FinalizerBase`, `_FinalizerImpl`)
A `FinalizerEntry` contains the `value`, the optional `detach` key, and
the `token`, and a reference to the `finalizer`.
An entry only holds on weakly to the value, detach key, and finalizer.
(Similar to how `WeakReference` only holds on weakly to target).
A `Finalizer` contains all entries, a list of entries of which the value
is collected, and a reference to the isolate.
When a the value of an entry is GCed, the enry is added over to the
collected list.
If any entry is moved to the collected list, a message is sent that
invokes the finalizer to call the callback on all entries in that list.
When a finalizer is detached by the user, the entry token is set to the
entry itself and is removed from the all entries set.
This ensures that if the entry was already moved to the collected list,
the finalizer is not executed.
To speed up detaching, we use a weak map from detach keys to list of
entries. This ensures entries can be GCed.
Both the scavenger and marker tasks process finalizer entries in
parallel.
Parallel tasks use an atomic exchange on the head of the collected
entries list, ensuring no entries get lost.
The mutator thread is guaranteed to be stopped when processing entries.
This ensures that we do not need barriers for moving entries into the
finalizers collected list.
Dart reads and replaces the collected entries list also with an atomic
exchange, ensuring the GC doesn't run in between a load/store.
When a finalizer gets posted a message to process finalized objects, it
is being kept alive by the message.
An alternative design would be to pre-allocate a `WeakReference` in the
finalizer pointing to the finalizer, and send that itself.
This would be at the cost of an extra object.
Send and exit is not supported in this CL, support will be added in a
follow up CL. Trying to send will throw.
Bug: https://github.com/dart-lang/sdk/issues/47777
TEST=runtime/tests/vm/dart/finalizer/*
TEST=runtime/tests/vm/dart_2/isolates/fast_object_copy_test.dart
TEST=runtime/vm/object_test.cc
Change-Id: Ibdfeadc16d5d69ade50aae5b9f794284c4c4dbab
Cq-Include-Trybots: luci.dart.try:vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-ffi-android-debug-arm64c-try,dart-sdk-mac-arm64-try,vm-kernel-mac-release-arm64-try,pkg-mac-release-arm64-try,vm-kernel-precomp-nnbd-mac-release-arm64-try,vm-kernel-win-debug-x64c-try,vm-kernel-win-debug-x64-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-nnbd-win-release-ia32-try,vm-ffi-android-debug-arm-try,vm-precomp-ffi-qemu-linux-release-arm-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-ia32-try,benchmark-linux-try,flutter-analyze-try,flutter-frontend-try,pkg-linux-debug-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-linux-debug-simarm_x64-try,vm-kernel-precomp-obfuscate-linux-release-x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/238086
Reviewed-by: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-03-25 10:29:30 +00:00
|
|
|
HANDLE_ILLEGAL_CASE(Finalizer)
|
[vm] Implement `NativeFinalizer`
This CL implements `NativeFinalizer` in the GC.
`FinalizerEntry`s are extended to track `external_size` and in which
`Heap::Space` the finalizable value is.
On attaching a native finalizer, the external size is added to the
relevant heap. When the finalizable value is promoted from new to old
space, the external size is promoted as well. And when a native
finalizer is run or is detached, the external size is removed from the
relevant heap again.
In contrast to Dart `Finalizer`s, `NativeFinalizer`s are run on isolate
shutdown.
When the `NativeFinalizer`s themselves are collected, the finalizers are
not run. Users should stick the native finalizer in a global variable to
ensure finalization. We will revisit this design when we add send and
exit support, because there is a design space to explore what to do in
that case. This current solution promises the least to users.
In this implementation native finalizers have a Dart entry to clean up
the entries from the `all_entries` field of the finalizer. We should
consider using another data structure that avoids the need for this Dart
entry. See the TODO left in the code.
Bug: https://github.com/dart-lang/sdk/issues/47777
TEST=runtime/tests/vm/dart(_2)/isolates/fast_object_copy_test.dart
TEST=runtime/vm/object_test.cc
TEST=tests/ffi(_2)/vmspecific_native_finalizer_*
Change-Id: I8f594c80c3c344ad83e1f2de10de028eb8456121
Cq-Include-Trybots: luci.dart.try:vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-ffi-android-debug-arm64c-try,dart-sdk-mac-arm64-try,vm-kernel-mac-release-arm64-try,pkg-mac-release-arm64-try,vm-kernel-precomp-nnbd-mac-release-arm64-try,vm-kernel-win-debug-x64c-try,vm-kernel-win-debug-x64-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-nnbd-win-release-ia32-try,vm-ffi-android-debug-arm-try,vm-precomp-ffi-qemu-linux-release-arm-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-ia32-try,benchmark-linux-try,flutter-frontend-try,pkg-linux-debug-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-linux-debug-simarm_x64-try,vm-kernel-precomp-obfuscate-linux-release-x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/236320
Reviewed-by: Martin Kustermann <kustermann@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-03-26 09:41:21 +00:00
|
|
|
HANDLE_ILLEGAL_CASE(NativeFinalizer)
|
2021-09-02 19:45:55 +00:00
|
|
|
HANDLE_ILLEGAL_CASE(MirrorReference)
|
2021-10-11 09:01:44 +00:00
|
|
|
HANDLE_ILLEGAL_CASE(Pointer)
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
HANDLE_ILLEGAL_CASE(ReceivePort)
|
2022-04-29 01:03:50 +00:00
|
|
|
HANDLE_ILLEGAL_CASE(SuspendState)
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
HANDLE_ILLEGAL_CASE(UserTag)
|
|
|
|
default:
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
Thread* thread_;
|
|
|
|
uword heap_base_;
|
|
|
|
Zone* zone_;
|
|
|
|
Heap* heap_;
|
|
|
|
ClassTable* class_table_;
|
|
|
|
Scavenger* new_space_;
|
|
|
|
Object& tmp_;
|
2022-06-01 17:52:28 +00:00
|
|
|
Object& to_;
|
2021-08-09 23:56:19 +00:00
|
|
|
intptr_t expando_cid_;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
|
|
|
|
const char* exception_msg_ = nullptr;
|
|
|
|
};
|
|
|
|
|
|
|
|
class FastObjectCopyBase : public ObjectCopyBase {
|
|
|
|
public:
|
|
|
|
using Types = PtrTypes;
|
|
|
|
|
|
|
|
explicit FastObjectCopyBase(Thread* thread)
|
|
|
|
: ObjectCopyBase(thread), fast_forward_map_(thread) {}
|
|
|
|
|
|
|
|
protected:
|
|
|
|
DART_FORCE_INLINE
|
|
|
|
void ForwardCompressedPointers(ObjectPtr src,
|
|
|
|
ObjectPtr dst,
|
|
|
|
intptr_t offset,
|
|
|
|
intptr_t end_offset) {
|
|
|
|
for (; offset < end_offset; offset += kCompressedWordSize) {
|
|
|
|
ForwardCompressedPointer(src, dst, offset);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
DART_FORCE_INLINE
|
|
|
|
void ForwardCompressedPointers(ObjectPtr src,
|
|
|
|
ObjectPtr dst,
|
|
|
|
intptr_t offset,
|
|
|
|
intptr_t end_offset,
|
|
|
|
UnboxedFieldBitmap bitmap) {
|
|
|
|
if (bitmap.IsEmpty()) {
|
|
|
|
ForwardCompressedPointers(src, dst, offset, end_offset);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
intptr_t bit = offset >> kCompressedWordSizeLog2;
|
|
|
|
for (; offset < end_offset; offset += kCompressedWordSize) {
|
|
|
|
if (bitmap.Get(bit++)) {
|
|
|
|
StoreCompressedNonPointerWord(
|
|
|
|
dst, offset, LoadCompressedNonPointerWord(src, offset));
|
|
|
|
} else {
|
|
|
|
ForwardCompressedPointer(src, dst, offset);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void ForwardCompressedArrayPointers(intptr_t array_length,
|
|
|
|
ObjectPtr src,
|
|
|
|
ObjectPtr dst,
|
|
|
|
intptr_t offset,
|
|
|
|
intptr_t end_offset) {
|
|
|
|
for (; offset < end_offset; offset += kCompressedWordSize) {
|
|
|
|
ForwardCompressedPointer(src, dst, offset);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-09-15 22:32:04 +00:00
|
|
|
void ForwardCompressedContextPointers(intptr_t context_length,
|
|
|
|
ObjectPtr src,
|
|
|
|
ObjectPtr dst,
|
|
|
|
intptr_t offset,
|
|
|
|
intptr_t end_offset) {
|
|
|
|
for (; offset < end_offset; offset += kCompressedWordSize) {
|
|
|
|
ForwardCompressedPointer(src, dst, offset);
|
2021-09-02 19:45:55 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
DART_FORCE_INLINE
|
|
|
|
void ForwardCompressedPointer(ObjectPtr src, ObjectPtr dst, intptr_t offset) {
|
|
|
|
auto value = LoadCompressedPointer(src, offset);
|
|
|
|
if (!value.IsHeapObject()) {
|
|
|
|
StoreCompressedPointerNoBarrier(dst, offset, value);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
auto value_decompressed = value.Decompress(heap_base_);
|
|
|
|
const uword tags = TagsFromUntaggedObject(value_decompressed.untag());
|
2021-09-02 19:45:55 +00:00
|
|
|
if (CanShareObject(value_decompressed, tags)) {
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
StoreCompressedPointerNoBarrier(dst, offset, value);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
ObjectPtr existing_to =
|
|
|
|
fast_forward_map_.ForwardedObject(value_decompressed);
|
|
|
|
if (existing_to != Marker()) {
|
|
|
|
StoreCompressedPointerNoBarrier(dst, offset, existing_to);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (UNLIKELY(!CanCopyObject(tags, value_decompressed))) {
|
|
|
|
ASSERT(exception_msg_ != nullptr);
|
|
|
|
StoreCompressedPointerNoBarrier(dst, offset, Object::null());
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2021-08-09 23:56:19 +00:00
|
|
|
auto to = Forward(tags, value_decompressed);
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
StoreCompressedPointerNoBarrier(dst, offset, to);
|
|
|
|
}
|
|
|
|
|
|
|
|
ObjectPtr Forward(uword tags, ObjectPtr from) {
|
|
|
|
const intptr_t header_size = UntaggedObject::SizeTag::decode(tags);
|
|
|
|
const auto cid = UntaggedObject::ClassIdTag::decode(tags);
|
|
|
|
const uword size =
|
|
|
|
header_size != 0 ? header_size : from.untag()->HeapSize();
|
|
|
|
if (Heap::IsAllocatableInNewSpace(size)) {
|
2022-06-07 21:56:41 +00:00
|
|
|
const uword alloc = new_space_->TryAllocateNoSafepoint(thread_, size);
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
if (alloc != 0) {
|
|
|
|
ObjectPtr to(reinterpret_cast<UntaggedObject*>(alloc));
|
2022-05-20 08:16:41 +00:00
|
|
|
fast_forward_map_.Insert(from, to, size);
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
|
|
|
|
if (IsExternalTypedDataClassId(cid)) {
|
|
|
|
SetNewSpaceTaggingWord(to, cid, header_size);
|
|
|
|
InitializeExternalTypedData(cid, ExternalTypedData::RawCast(from),
|
|
|
|
ExternalTypedData::RawCast(to));
|
|
|
|
fast_forward_map_.AddExternalTypedData(
|
|
|
|
ExternalTypedData::RawCast(to));
|
|
|
|
} else if (IsTypedDataViewClassId(cid)) {
|
|
|
|
// We set the views backing store to `null` to satisfy an assertion in
|
|
|
|
// GCCompactor::VisitTypedDataViewPointers().
|
|
|
|
SetNewSpaceTaggingWord(to, cid, header_size);
|
|
|
|
InitializeTypedDataView(TypedDataView::RawCast(to));
|
|
|
|
}
|
|
|
|
return to;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
exception_msg_ = kFastAllocationFailed;
|
|
|
|
return Marker();
|
|
|
|
}
|
|
|
|
|
|
|
|
void EnqueueTransferable(TransferableTypedDataPtr from,
|
|
|
|
TransferableTypedDataPtr to) {
|
|
|
|
fast_forward_map_.AddTransferable(from, to);
|
|
|
|
}
|
2021-08-09 23:56:19 +00:00
|
|
|
void EnqueueWeakProperty(WeakPropertyPtr from) {
|
|
|
|
fast_forward_map_.AddWeakProperty(from);
|
|
|
|
}
|
[vm] Implement `WeakReference` in the VM
This CL implements `WeakReference` in the VM.
* This reduces the size of weak references from 2 objects using 8 words
to 1 object using 4 words.
* This makes loads of weak reference targets a single load instead of
two.
* This avoids the fix-point in the GC and message object copying for
weak references. (N.b. Weak references need to be processed _after_
the fix-point for weak properties.)
The semantics of weak references in messages is that their target gets
set to `null` if the target is not included in the message by a strong
reference.
The tests take particular care to exercise the case where a weak
reference's target is only kept alive because a weak property key is
alive and it refers to the target in its value. This exercises the fact
that weak references need to be processed last.
Does not add support for weak references in the app snapshot. It would
be dead code until we start using weak references in for example the
CFE.
This CL does not try to unify weak references and weak properties in
the GC or messaging (as proposed in go/dart-vm-weakreference), because
their semantics differ enough.
Closes: https://github.com/dart-lang/sdk/issues/48162
TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart
TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart
TEST=runtime/vm/object_test.cc
TEST=tests/lib/isolate/weak_reference_message_1_test.dart
TEST=tests/lib/isolate/weak_reference_message_2_test.dart
Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c
Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
|
|
|
void EnqueueWeakReference(WeakReferencePtr from) {
|
|
|
|
fast_forward_map_.AddWeakReference(from);
|
|
|
|
}
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
void EnqueueObjectToRehash(ObjectPtr to) {
|
|
|
|
fast_forward_map_.AddObjectToRehash(to);
|
|
|
|
}
|
2021-08-09 23:56:19 +00:00
|
|
|
void EnqueueExpandoToRehash(ObjectPtr to) {
|
|
|
|
fast_forward_map_.AddExpandoToRehash(to);
|
|
|
|
}
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
|
|
|
|
static void StoreCompressedArrayPointers(intptr_t array_length,
|
|
|
|
ObjectPtr src,
|
|
|
|
ObjectPtr dst,
|
|
|
|
intptr_t offset,
|
|
|
|
intptr_t end_offset) {
|
|
|
|
StoreCompressedPointers(src, dst, offset, end_offset);
|
|
|
|
}
|
|
|
|
static void StoreCompressedPointers(ObjectPtr src,
|
|
|
|
ObjectPtr dst,
|
|
|
|
intptr_t offset,
|
|
|
|
intptr_t end_offset) {
|
|
|
|
StoreCompressedPointersNoBarrier(src, dst, offset, end_offset);
|
|
|
|
}
|
|
|
|
static void StoreCompressedPointersNoBarrier(ObjectPtr src,
|
|
|
|
ObjectPtr dst,
|
|
|
|
intptr_t offset,
|
|
|
|
intptr_t end_offset) {
|
|
|
|
for (; offset <= end_offset; offset += kCompressedWordSize) {
|
|
|
|
StoreCompressedPointerNoBarrier(dst, offset,
|
|
|
|
LoadCompressedPointer(src, offset));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
protected:
|
|
|
|
friend class ObjectGraphCopier;
|
|
|
|
|
|
|
|
FastForwardMap fast_forward_map_;
|
|
|
|
};
|
|
|
|
|
|
|
|
class SlowObjectCopyBase : public ObjectCopyBase {
|
|
|
|
public:
|
|
|
|
using Types = HandleTypes;
|
|
|
|
|
|
|
|
explicit SlowObjectCopyBase(Thread* thread)
|
|
|
|
: ObjectCopyBase(thread), slow_forward_map_(thread) {}
|
|
|
|
|
|
|
|
protected:
|
|
|
|
DART_FORCE_INLINE
|
|
|
|
void ForwardCompressedPointers(const Object& src,
|
|
|
|
const Object& dst,
|
|
|
|
intptr_t offset,
|
|
|
|
intptr_t end_offset) {
|
|
|
|
for (; offset < end_offset; offset += kCompressedWordSize) {
|
|
|
|
ForwardCompressedPointer(src, dst, offset);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
DART_FORCE_INLINE
|
|
|
|
void ForwardCompressedPointers(const Object& src,
|
|
|
|
const Object& dst,
|
|
|
|
intptr_t offset,
|
|
|
|
intptr_t end_offset,
|
|
|
|
UnboxedFieldBitmap bitmap) {
|
|
|
|
intptr_t bit = offset >> kCompressedWordSizeLog2;
|
|
|
|
for (; offset < end_offset; offset += kCompressedWordSize) {
|
|
|
|
if (bitmap.Get(bit++)) {
|
|
|
|
StoreCompressedNonPointerWord(
|
|
|
|
dst.ptr(), offset, LoadCompressedNonPointerWord(src.ptr(), offset));
|
|
|
|
} else {
|
|
|
|
ForwardCompressedPointer(src, dst, offset);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void ForwardCompressedArrayPointers(intptr_t array_length,
|
|
|
|
const Object& src,
|
|
|
|
const Object& dst,
|
|
|
|
intptr_t offset,
|
|
|
|
intptr_t end_offset) {
|
|
|
|
if (Array::UseCardMarkingForAllocation(array_length)) {
|
|
|
|
for (; offset < end_offset; offset += kCompressedWordSize) {
|
|
|
|
ForwardCompressedLargeArrayPointer(src, dst, offset);
|
2022-05-19 12:22:24 +00:00
|
|
|
thread_->CheckForSafepoint();
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
|
|
|
} else {
|
|
|
|
for (; offset < end_offset; offset += kCompressedWordSize) {
|
|
|
|
ForwardCompressedPointer(src, dst, offset);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-09-15 22:32:04 +00:00
|
|
|
void ForwardCompressedContextPointers(intptr_t context_length,
|
|
|
|
const Object& src,
|
|
|
|
const Object& dst,
|
|
|
|
intptr_t offset,
|
|
|
|
intptr_t end_offset) {
|
|
|
|
for (; offset < end_offset; offset += kCompressedWordSize) {
|
|
|
|
ForwardCompressedPointer(src, dst, offset);
|
2021-09-02 19:45:55 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
DART_FORCE_INLINE
|
|
|
|
void ForwardCompressedLargeArrayPointer(const Object& src,
|
|
|
|
const Object& dst,
|
|
|
|
intptr_t offset) {
|
|
|
|
auto value = LoadCompressedPointer(src.ptr(), offset);
|
|
|
|
if (!value.IsHeapObject()) {
|
|
|
|
StoreCompressedPointerNoBarrier(dst.ptr(), offset, value);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
auto value_decompressed = value.Decompress(heap_base_);
|
|
|
|
const uword tags = TagsFromUntaggedObject(value_decompressed.untag());
|
2021-09-02 19:45:55 +00:00
|
|
|
if (CanShareObject(value_decompressed, tags)) {
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
StoreCompressedLargeArrayPointerBarrier(dst.ptr(), offset,
|
|
|
|
value_decompressed);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
ObjectPtr existing_to =
|
|
|
|
slow_forward_map_.ForwardedObject(value_decompressed);
|
|
|
|
if (existing_to != Marker()) {
|
|
|
|
StoreCompressedLargeArrayPointerBarrier(dst.ptr(), offset, existing_to);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (UNLIKELY(!CanCopyObject(tags, value_decompressed))) {
|
|
|
|
ASSERT(exception_msg_ != nullptr);
|
|
|
|
StoreCompressedLargeArrayPointerBarrier(dst.ptr(), offset,
|
|
|
|
Object::null());
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
tmp_ = value_decompressed;
|
|
|
|
tmp_ = Forward(tags, tmp_); // Only this can cause allocation.
|
|
|
|
StoreCompressedLargeArrayPointerBarrier(dst.ptr(), offset, tmp_.ptr());
|
|
|
|
}
|
|
|
|
DART_FORCE_INLINE
|
|
|
|
void ForwardCompressedPointer(const Object& src,
|
|
|
|
const Object& dst,
|
|
|
|
intptr_t offset) {
|
|
|
|
auto value = LoadCompressedPointer(src.ptr(), offset);
|
|
|
|
if (!value.IsHeapObject()) {
|
|
|
|
StoreCompressedPointerNoBarrier(dst.ptr(), offset, value);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
auto value_decompressed = value.Decompress(heap_base_);
|
|
|
|
const uword tags = TagsFromUntaggedObject(value_decompressed.untag());
|
2021-09-02 19:45:55 +00:00
|
|
|
if (CanShareObject(value_decompressed, tags)) {
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
StoreCompressedPointerBarrier(dst.ptr(), offset, value_decompressed);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
ObjectPtr existing_to =
|
|
|
|
slow_forward_map_.ForwardedObject(value_decompressed);
|
|
|
|
if (existing_to != Marker()) {
|
|
|
|
StoreCompressedPointerBarrier(dst.ptr(), offset, existing_to);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (UNLIKELY(!CanCopyObject(tags, value_decompressed))) {
|
|
|
|
ASSERT(exception_msg_ != nullptr);
|
|
|
|
StoreCompressedPointerNoBarrier(dst.ptr(), offset, Object::null());
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
tmp_ = value_decompressed;
|
|
|
|
tmp_ = Forward(tags, tmp_); // Only this can cause allocation.
|
|
|
|
StoreCompressedPointerBarrier(dst.ptr(), offset, tmp_.ptr());
|
|
|
|
}
|
2021-09-02 19:45:55 +00:00
|
|
|
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
ObjectPtr Forward(uword tags, const Object& from) {
|
|
|
|
const intptr_t cid = UntaggedObject::ClassIdTag::decode(tags);
|
|
|
|
intptr_t size = UntaggedObject::SizeTag::decode(tags);
|
|
|
|
if (size == 0) {
|
|
|
|
size = from.ptr().untag()->HeapSize();
|
|
|
|
}
|
2022-06-01 17:52:28 +00:00
|
|
|
to_ = AllocateObject(cid, size);
|
|
|
|
UpdateLengthField(cid, from.ptr(), to_.ptr());
|
|
|
|
slow_forward_map_.Insert(from, to_, size); // SAFEPOINT
|
|
|
|
ObjectPtr to = to_.ptr();
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
if (cid == kArrayCid && !Heap::IsAllocatableInNewSpace(size)) {
|
|
|
|
to.untag()->SetCardRememberedBitUnsynchronized();
|
|
|
|
}
|
|
|
|
if (IsExternalTypedDataClassId(cid)) {
|
2022-05-19 12:22:24 +00:00
|
|
|
const auto& external_to = slow_forward_map_.AddExternalTypedData(
|
|
|
|
ExternalTypedData::RawCast(to));
|
|
|
|
InitializeExternalTypedDataWithSafepointChecks(
|
|
|
|
thread_, cid, ExternalTypedData::Cast(from), external_to);
|
|
|
|
return external_to.ptr();
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
} else if (IsTypedDataViewClassId(cid)) {
|
|
|
|
// We set the views backing store to `null` to satisfy an assertion in
|
|
|
|
// GCCompactor::VisitTypedDataViewPointers().
|
|
|
|
InitializeTypedDataView(TypedDataView::RawCast(to));
|
|
|
|
}
|
|
|
|
return to;
|
|
|
|
}
|
|
|
|
void EnqueueTransferable(const TransferableTypedData& from,
|
|
|
|
const TransferableTypedData& to) {
|
|
|
|
slow_forward_map_.AddTransferable(from, to);
|
|
|
|
}
|
2021-08-09 23:56:19 +00:00
|
|
|
void EnqueueWeakProperty(const WeakProperty& from) {
|
|
|
|
slow_forward_map_.AddWeakProperty(from);
|
|
|
|
}
|
[vm] Implement `WeakReference` in the VM
This CL implements `WeakReference` in the VM.
* This reduces the size of weak references from 2 objects using 8 words
to 1 object using 4 words.
* This makes loads of weak reference targets a single load instead of
two.
* This avoids the fix-point in the GC and message object copying for
weak references. (N.b. Weak references need to be processed _after_
the fix-point for weak properties.)
The semantics of weak references in messages is that their target gets
set to `null` if the target is not included in the message by a strong
reference.
The tests take particular care to exercise the case where a weak
reference's target is only kept alive because a weak property key is
alive and it refers to the target in its value. This exercises the fact
that weak references need to be processed last.
Does not add support for weak references in the app snapshot. It would
be dead code until we start using weak references in for example the
CFE.
This CL does not try to unify weak references and weak properties in
the GC or messaging (as proposed in go/dart-vm-weakreference), because
their semantics differ enough.
Closes: https://github.com/dart-lang/sdk/issues/48162
TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart
TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart
TEST=runtime/vm/object_test.cc
TEST=tests/lib/isolate/weak_reference_message_1_test.dart
TEST=tests/lib/isolate/weak_reference_message_2_test.dart
Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c
Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
|
|
|
void EnqueueWeakReference(const WeakReference& from) {
|
|
|
|
slow_forward_map_.AddWeakReference(from);
|
|
|
|
}
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
void EnqueueObjectToRehash(const Object& to) {
|
|
|
|
slow_forward_map_.AddObjectToRehash(to);
|
|
|
|
}
|
2021-08-09 23:56:19 +00:00
|
|
|
void EnqueueExpandoToRehash(const Object& to) {
|
|
|
|
slow_forward_map_.AddExpandoToRehash(to);
|
|
|
|
}
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
|
|
|
|
void StoreCompressedArrayPointers(intptr_t array_length,
|
|
|
|
const Object& src,
|
|
|
|
const Object& dst,
|
|
|
|
intptr_t offset,
|
|
|
|
intptr_t end_offset) {
|
|
|
|
auto src_ptr = src.ptr();
|
|
|
|
auto dst_ptr = dst.ptr();
|
|
|
|
if (Array::UseCardMarkingForAllocation(array_length)) {
|
|
|
|
for (; offset <= end_offset; offset += kCompressedWordSize) {
|
|
|
|
StoreCompressedLargeArrayPointerBarrier(
|
|
|
|
dst_ptr, offset,
|
|
|
|
LoadCompressedPointer(src_ptr, offset).Decompress(heap_base_));
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
for (; offset <= end_offset; offset += kCompressedWordSize) {
|
|
|
|
StoreCompressedPointerBarrier(
|
|
|
|
dst_ptr, offset,
|
|
|
|
LoadCompressedPointer(src_ptr, offset).Decompress(heap_base_));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
void StoreCompressedPointers(const Object& src,
|
|
|
|
const Object& dst,
|
|
|
|
intptr_t offset,
|
|
|
|
intptr_t end_offset) {
|
|
|
|
auto src_ptr = src.ptr();
|
|
|
|
auto dst_ptr = dst.ptr();
|
|
|
|
for (; offset <= end_offset; offset += kCompressedWordSize) {
|
|
|
|
StoreCompressedPointerBarrier(
|
|
|
|
dst_ptr, offset,
|
|
|
|
LoadCompressedPointer(src_ptr, offset).Decompress(heap_base_));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
static void StoreCompressedPointersNoBarrier(const Object& src,
|
|
|
|
const Object& dst,
|
|
|
|
intptr_t offset,
|
|
|
|
intptr_t end_offset) {
|
|
|
|
auto src_ptr = src.ptr();
|
|
|
|
auto dst_ptr = dst.ptr();
|
|
|
|
for (; offset <= end_offset; offset += kCompressedWordSize) {
|
|
|
|
StoreCompressedPointerNoBarrier(dst_ptr, offset,
|
|
|
|
LoadCompressedPointer(src_ptr, offset));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
protected:
|
|
|
|
friend class ObjectGraphCopier;
|
|
|
|
|
|
|
|
SlowForwardMap slow_forward_map_;
|
|
|
|
};
|
|
|
|
|
|
|
|
template <typename Base>
|
|
|
|
class ObjectCopy : public Base {
|
|
|
|
public:
|
|
|
|
using Types = typename Base::Types;
|
|
|
|
|
|
|
|
explicit ObjectCopy(Thread* thread) : Base(thread) {}
|
|
|
|
|
|
|
|
void CopyPredefinedInstance(typename Types::Object from,
|
|
|
|
typename Types::Object to,
|
|
|
|
intptr_t cid) {
|
|
|
|
if (IsImplicitFieldClassId(cid)) {
|
|
|
|
CopyUserdefinedInstance(from, to);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
switch (cid) {
|
|
|
|
#define COPY_TO(clazz) \
|
|
|
|
case clazz::kClassId: { \
|
|
|
|
typename Types::clazz casted_from = Types::Cast##clazz(from); \
|
|
|
|
typename Types::clazz casted_to = Types::Cast##clazz(to); \
|
|
|
|
Copy##clazz(casted_from, casted_to); \
|
|
|
|
return; \
|
|
|
|
}
|
|
|
|
|
|
|
|
CLASS_LIST_NO_OBJECT_NOR_STRING_NOR_ARRAY_NOR_MAP(COPY_TO)
|
|
|
|
COPY_TO(Array)
|
2021-08-16 22:52:21 +00:00
|
|
|
COPY_TO(GrowableObjectArray)
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
COPY_TO(LinkedHashMap)
|
|
|
|
COPY_TO(LinkedHashSet)
|
|
|
|
#undef COPY_TO
|
|
|
|
|
|
|
|
#define COPY_TO(clazz) case kTypedData##clazz##Cid:
|
|
|
|
|
|
|
|
CLASS_LIST_TYPED_DATA(COPY_TO) {
|
|
|
|
typename Types::TypedData casted_from = Types::CastTypedData(from);
|
|
|
|
typename Types::TypedData casted_to = Types::CastTypedData(to);
|
|
|
|
CopyTypedData(casted_from, casted_to);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
#undef COPY_TO
|
|
|
|
|
|
|
|
case kByteDataViewCid:
|
|
|
|
#define COPY_TO(clazz) case kTypedData##clazz##ViewCid:
|
|
|
|
CLASS_LIST_TYPED_DATA(COPY_TO) {
|
|
|
|
typename Types::TypedDataView casted_from =
|
|
|
|
Types::CastTypedDataView(from);
|
|
|
|
typename Types::TypedDataView casted_to =
|
|
|
|
Types::CastTypedDataView(to);
|
|
|
|
CopyTypedDataView(casted_from, casted_to);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
#undef COPY_TO
|
|
|
|
|
|
|
|
#define COPY_TO(clazz) case kExternalTypedData##clazz##Cid:
|
|
|
|
|
|
|
|
CLASS_LIST_TYPED_DATA(COPY_TO) {
|
|
|
|
typename Types::ExternalTypedData casted_from =
|
|
|
|
Types::CastExternalTypedData(from);
|
|
|
|
typename Types::ExternalTypedData casted_to =
|
|
|
|
Types::CastExternalTypedData(to);
|
|
|
|
CopyExternalTypedData(casted_from, casted_to);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
#undef COPY_TO
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
const Object& obj = Types::HandlifyObject(from);
|
|
|
|
FATAL1("Unexpected object: %s\n", obj.ToCString());
|
|
|
|
}
|
|
|
|
|
|
|
|
#if defined(DART_PRECOMPILED_RUNTIME)
|
|
|
|
void CopyUserdefinedInstanceAOT(typename Types::Object from,
|
|
|
|
typename Types::Object to,
|
|
|
|
UnboxedFieldBitmap bitmap) {
|
|
|
|
const intptr_t instance_size = UntagObject(from)->HeapSize();
|
|
|
|
Base::ForwardCompressedPointers(from, to, kWordSize, instance_size, bitmap);
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
void CopyUserdefinedInstance(typename Types::Object from,
|
|
|
|
typename Types::Object to) {
|
|
|
|
const intptr_t instance_size = UntagObject(from)->HeapSize();
|
|
|
|
Base::ForwardCompressedPointers(from, to, kWordSize, instance_size);
|
|
|
|
}
|
|
|
|
|
|
|
|
void CopyClosure(typename Types::Closure from, typename Types::Closure to) {
|
|
|
|
Base::StoreCompressedPointers(
|
|
|
|
from, to, OFFSET_OF(UntaggedClosure, instantiator_type_arguments_),
|
|
|
|
OFFSET_OF(UntaggedClosure, function_));
|
|
|
|
Base::ForwardCompressedPointer(from, to,
|
|
|
|
OFFSET_OF(UntaggedClosure, context_));
|
|
|
|
Base::StoreCompressedPointersNoBarrier(from, to,
|
|
|
|
OFFSET_OF(UntaggedClosure, hash_),
|
|
|
|
OFFSET_OF(UntaggedClosure, hash_));
|
|
|
|
ONLY_IN_PRECOMPILED(UntagClosure(to)->entry_point_ =
|
|
|
|
UntagClosure(from)->entry_point_);
|
|
|
|
}
|
|
|
|
|
2021-09-02 19:45:55 +00:00
|
|
|
void CopyContext(typename Types::Context from, typename Types::Context to) {
|
|
|
|
const intptr_t length = Context::NumVariables(Types::GetContextPtr(from));
|
|
|
|
|
|
|
|
UntagContext(to)->num_variables_ = UntagContext(from)->num_variables_;
|
|
|
|
|
2021-09-15 22:32:04 +00:00
|
|
|
Base::ForwardCompressedPointer(from, to,
|
|
|
|
OFFSET_OF(UntaggedContext, parent_));
|
|
|
|
Base::ForwardCompressedContextPointers(
|
2021-09-02 19:45:55 +00:00
|
|
|
length, from, to, Context::variable_offset(0),
|
2021-09-15 22:32:04 +00:00
|
|
|
Context::variable_offset(0) + Context::kBytesPerElement * length);
|
2021-09-02 19:45:55 +00:00
|
|
|
}
|
|
|
|
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
void CopyArray(typename Types::Array from, typename Types::Array to) {
|
|
|
|
const intptr_t length = Smi::Value(UntagArray(from)->length());
|
|
|
|
Base::StoreCompressedArrayPointers(
|
|
|
|
length, from, to, OFFSET_OF(UntaggedArray, type_arguments_),
|
|
|
|
OFFSET_OF(UntaggedArray, type_arguments_));
|
|
|
|
Base::StoreCompressedPointersNoBarrier(from, to,
|
|
|
|
OFFSET_OF(UntaggedArray, length_),
|
|
|
|
OFFSET_OF(UntaggedArray, length_));
|
|
|
|
Base::ForwardCompressedArrayPointers(
|
|
|
|
length, from, to, Array::data_offset(),
|
|
|
|
Array::data_offset() + kCompressedWordSize * length);
|
|
|
|
}
|
|
|
|
|
|
|
|
void CopyGrowableObjectArray(typename Types::GrowableObjectArray from,
|
|
|
|
typename Types::GrowableObjectArray to) {
|
|
|
|
Base::StoreCompressedPointers(
|
|
|
|
from, to, OFFSET_OF(UntaggedGrowableObjectArray, type_arguments_),
|
|
|
|
OFFSET_OF(UntaggedGrowableObjectArray, type_arguments_));
|
|
|
|
Base::StoreCompressedPointersNoBarrier(
|
|
|
|
from, to, OFFSET_OF(UntaggedGrowableObjectArray, length_),
|
|
|
|
OFFSET_OF(UntaggedGrowableObjectArray, length_));
|
|
|
|
Base::ForwardCompressedPointer(
|
|
|
|
from, to, OFFSET_OF(UntaggedGrowableObjectArray, data_));
|
|
|
|
}
|
|
|
|
|
|
|
|
template <intptr_t one_for_set_two_for_map, typename T>
|
|
|
|
void CopyLinkedHashBase(T from,
|
|
|
|
T to,
|
|
|
|
UntaggedLinkedHashBase* from_untagged,
|
|
|
|
UntaggedLinkedHashBase* to_untagged) {
|
|
|
|
// We have to find out whether the map needs re-hashing on the receiver side
|
|
|
|
// due to keys being copied and the keys therefore possibly having different
|
|
|
|
// hash codes (e.g. due to user-defined hashCode implementation or due to
|
|
|
|
// new identity hash codes of the copied objects).
|
|
|
|
bool needs_rehashing = false;
|
|
|
|
ArrayPtr data = from_untagged->data_.Decompress(Base::heap_base_);
|
|
|
|
if (data != Array::null()) {
|
|
|
|
UntaggedArray* untagged_data = data.untag();
|
|
|
|
const intptr_t length = Smi::Value(untagged_data->length_);
|
|
|
|
auto key_value_pairs = untagged_data->data();
|
|
|
|
for (intptr_t i = 0; i < length; i += one_for_set_two_for_map) {
|
|
|
|
ObjectPtr key = key_value_pairs[i].Decompress(Base::heap_base_);
|
2021-11-03 13:36:35 +00:00
|
|
|
const bool is_deleted_entry = key == data;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
if (key->IsHeapObject()) {
|
2021-11-03 13:36:35 +00:00
|
|
|
if (!is_deleted_entry && MightNeedReHashing(key)) {
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
needs_rehashing = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
Base::StoreCompressedPointers(
|
|
|
|
from, to, OFFSET_OF(UntaggedLinkedHashBase, type_arguments_),
|
|
|
|
OFFSET_OF(UntaggedLinkedHashBase, type_arguments_));
|
|
|
|
|
|
|
|
// Compared with the snapshot-based (de)serializer we do preserve the same
|
|
|
|
// backing store (i.e. used_data/deleted_keys/data) and therefore do not
|
|
|
|
// magically shrink backing store based on usage.
|
|
|
|
//
|
|
|
|
// We do this to avoid making assumptions about the object graph and the
|
|
|
|
// linked hash map (e.g. assuming there's no other references to the data,
|
|
|
|
// assuming the linked hashmap is in a consistent state)
|
|
|
|
if (needs_rehashing) {
|
|
|
|
to_untagged->hash_mask_ = Smi::New(0);
|
|
|
|
to_untagged->index_ = TypedData::RawCast(Object::null());
|
2021-11-03 13:36:35 +00:00
|
|
|
to_untagged->deleted_keys_ = Smi::New(0);
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
Base::EnqueueObjectToRehash(to);
|
|
|
|
}
|
|
|
|
|
|
|
|
// From this point on we shouldn't use the raw pointers, since GC might
|
|
|
|
// happen when forwarding objects.
|
|
|
|
from_untagged = nullptr;
|
|
|
|
to_untagged = nullptr;
|
|
|
|
|
|
|
|
if (!needs_rehashing) {
|
|
|
|
Base::ForwardCompressedPointer(from, to,
|
|
|
|
OFFSET_OF(UntaggedLinkedHashBase, index_));
|
|
|
|
Base::StoreCompressedPointersNoBarrier(
|
|
|
|
from, to, OFFSET_OF(UntaggedLinkedHashBase, hash_mask_),
|
|
|
|
OFFSET_OF(UntaggedLinkedHashBase, hash_mask_));
|
2021-11-03 13:36:35 +00:00
|
|
|
Base::StoreCompressedPointersNoBarrier(
|
|
|
|
from, to, OFFSET_OF(UntaggedLinkedHashMap, deleted_keys_),
|
|
|
|
OFFSET_OF(UntaggedLinkedHashMap, deleted_keys_));
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
|
|
|
Base::ForwardCompressedPointer(from, to,
|
|
|
|
OFFSET_OF(UntaggedLinkedHashBase, data_));
|
|
|
|
Base::StoreCompressedPointersNoBarrier(
|
|
|
|
from, to, OFFSET_OF(UntaggedLinkedHashBase, used_data_),
|
|
|
|
OFFSET_OF(UntaggedLinkedHashBase, used_data_));
|
|
|
|
}
|
|
|
|
|
|
|
|
void CopyLinkedHashMap(typename Types::LinkedHashMap from,
|
|
|
|
typename Types::LinkedHashMap to) {
|
|
|
|
CopyLinkedHashBase<2, typename Types::LinkedHashMap>(
|
|
|
|
from, to, UntagLinkedHashMap(from), UntagLinkedHashMap(to));
|
|
|
|
}
|
2021-08-09 23:56:19 +00:00
|
|
|
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
void CopyLinkedHashSet(typename Types::LinkedHashSet from,
|
|
|
|
typename Types::LinkedHashSet to) {
|
|
|
|
CopyLinkedHashBase<1, typename Types::LinkedHashSet>(
|
|
|
|
from, to, UntagLinkedHashSet(from), UntagLinkedHashSet(to));
|
|
|
|
}
|
|
|
|
|
|
|
|
void CopyDouble(typename Types::Double from, typename Types::Double to) {
|
|
|
|
#if !defined(DART_PRECOMPILED_RUNTIME)
|
|
|
|
auto raw_from = UntagDouble(from);
|
|
|
|
auto raw_to = UntagDouble(to);
|
|
|
|
raw_to->value_ = raw_from->value_;
|
|
|
|
#else
|
|
|
|
// Will be shared and not copied.
|
|
|
|
UNREACHABLE();
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
void CopyFloat32x4(typename Types::Float32x4 from,
|
|
|
|
typename Types::Float32x4 to) {
|
|
|
|
#if !defined(DART_PRECOMPILED_RUNTIME)
|
|
|
|
auto raw_from = UntagFloat32x4(from);
|
|
|
|
auto raw_to = UntagFloat32x4(to);
|
|
|
|
raw_to->value_[0] = raw_from->value_[0];
|
|
|
|
raw_to->value_[1] = raw_from->value_[1];
|
|
|
|
raw_to->value_[2] = raw_from->value_[2];
|
|
|
|
raw_to->value_[3] = raw_from->value_[3];
|
|
|
|
#else
|
|
|
|
// Will be shared and not copied.
|
|
|
|
UNREACHABLE();
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
void CopyFloat64x2(typename Types::Float64x2 from,
|
|
|
|
typename Types::Float64x2 to) {
|
|
|
|
#if !defined(DART_PRECOMPILED_RUNTIME)
|
|
|
|
auto raw_from = UntagFloat64x2(from);
|
|
|
|
auto raw_to = UntagFloat64x2(to);
|
|
|
|
raw_to->value_[0] = raw_from->value_[0];
|
|
|
|
raw_to->value_[1] = raw_from->value_[1];
|
|
|
|
#else
|
|
|
|
// Will be shared and not copied.
|
|
|
|
UNREACHABLE();
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2022-05-19 12:22:24 +00:00
|
|
|
void CopyTypedData(TypedDataPtr from, TypedDataPtr to) {
|
|
|
|
auto raw_from = from.untag();
|
|
|
|
auto raw_to = to.untag();
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
const intptr_t cid = Types::GetTypedDataPtr(from)->GetClassId();
|
|
|
|
raw_to->length_ = raw_from->length_;
|
|
|
|
raw_to->RecomputeDataField();
|
|
|
|
const intptr_t length =
|
|
|
|
TypedData::ElementSizeInBytes(cid) * Smi::Value(raw_from->length_);
|
|
|
|
memmove(raw_to->data_, raw_from->data_, length);
|
|
|
|
}
|
|
|
|
|
2022-05-19 12:22:24 +00:00
|
|
|
void CopyTypedData(const TypedData& from, const TypedData& to) {
|
|
|
|
auto raw_from = from.ptr().untag();
|
|
|
|
auto raw_to = to.ptr().untag();
|
|
|
|
const intptr_t cid = Types::GetTypedDataPtr(from)->GetClassId();
|
2022-06-07 21:56:41 +00:00
|
|
|
ASSERT(raw_to->length_ == raw_from->length_);
|
2022-05-19 12:22:24 +00:00
|
|
|
raw_to->RecomputeDataField();
|
|
|
|
const intptr_t length =
|
|
|
|
TypedData::ElementSizeInBytes(cid) * Smi::Value(raw_from->length_);
|
|
|
|
CopyTypedDataBaseWithSafepointChecks(Base::thread_, from, to, length);
|
|
|
|
}
|
|
|
|
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
void CopyTypedDataView(typename Types::TypedDataView from,
|
|
|
|
typename Types::TypedDataView to) {
|
|
|
|
// This will forward & initialize the typed data.
|
|
|
|
Base::ForwardCompressedPointer(
|
|
|
|
from, to, OFFSET_OF(UntaggedTypedDataView, typed_data_));
|
|
|
|
|
|
|
|
auto raw_from = UntagTypedDataView(from);
|
|
|
|
auto raw_to = UntagTypedDataView(to);
|
|
|
|
raw_to->length_ = raw_from->length_;
|
|
|
|
raw_to->offset_in_bytes_ = raw_from->offset_in_bytes_;
|
|
|
|
raw_to->data_ = nullptr;
|
|
|
|
|
2021-08-09 23:56:19 +00:00
|
|
|
auto forwarded_backing_store =
|
|
|
|
raw_to->typed_data_.Decompress(Base::heap_base_);
|
|
|
|
if (forwarded_backing_store == Marker() ||
|
|
|
|
forwarded_backing_store == Object::null()) {
|
|
|
|
// Ensure the backing store is never "sentinel" - the scavenger doesn't
|
|
|
|
// like it.
|
|
|
|
Base::StoreCompressedPointerNoBarrier(
|
|
|
|
Types::GetTypedDataViewPtr(to),
|
|
|
|
OFFSET_OF(UntaggedTypedDataView, typed_data_), Object::null());
|
|
|
|
raw_to->length_ = 0;
|
|
|
|
raw_to->offset_in_bytes_ = 0;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
ASSERT(Base::exception_msg_ != nullptr);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
const bool is_external =
|
|
|
|
raw_from->data_ != raw_from->DataFieldForInternalTypedData();
|
|
|
|
if (is_external) {
|
|
|
|
// The raw_to is fully initialized at this point (see handling of external
|
|
|
|
// typed data in [ForwardCompressedPointer])
|
|
|
|
raw_to->RecomputeDataField();
|
|
|
|
} else {
|
|
|
|
// The raw_to isn't initialized yet, but it's address is valid, so we can
|
|
|
|
// compute the data field it would use.
|
|
|
|
raw_to->RecomputeDataFieldForInternalTypedData();
|
|
|
|
}
|
|
|
|
const bool is_external2 =
|
|
|
|
raw_to->data_ != raw_to->DataFieldForInternalTypedData();
|
|
|
|
ASSERT(is_external == is_external2);
|
|
|
|
}
|
|
|
|
|
|
|
|
void CopyExternalTypedData(typename Types::ExternalTypedData from,
|
|
|
|
typename Types::ExternalTypedData to) {
|
|
|
|
// The external typed data is initialized on the forwarding pass (where
|
|
|
|
// normally allocation but not initialization happens), so views on it
|
|
|
|
// can be initialized immediately.
|
|
|
|
#if defined(DEBUG)
|
|
|
|
auto raw_from = UntagExternalTypedData(from);
|
|
|
|
auto raw_to = UntagExternalTypedData(to);
|
|
|
|
ASSERT(raw_to->data_ != nullptr);
|
|
|
|
ASSERT(raw_to->length_ == raw_from->length_);
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
void CopyTransferableTypedData(typename Types::TransferableTypedData from,
|
|
|
|
typename Types::TransferableTypedData to) {
|
|
|
|
// The [TransferableTypedData] is an empty object with an associated heap
|
|
|
|
// peer object.
|
|
|
|
// -> We'll validate that there's a peer and enqueue the transferable to be
|
|
|
|
// transferred if the transitive copy is successful.
|
|
|
|
auto fpeer = static_cast<TransferableTypedDataPeer*>(
|
|
|
|
Base::heap_->GetPeer(Types::GetTransferableTypedDataPtr(from)));
|
|
|
|
ASSERT(fpeer != nullptr);
|
|
|
|
if (fpeer->data() == nullptr) {
|
|
|
|
Base::exception_msg_ =
|
|
|
|
"Illegal argument in isolate message"
|
|
|
|
" : (TransferableTypedData has been transferred already)";
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
Base::EnqueueTransferable(from, to);
|
|
|
|
}
|
|
|
|
|
2021-08-09 23:56:19 +00:00
|
|
|
void CopyWeakProperty(typename Types::WeakProperty from,
|
|
|
|
typename Types::WeakProperty to) {
|
|
|
|
// We store `null`s as keys/values and let the main algorithm know that
|
|
|
|
// we should check reachability of the key again after the fixpoint (if it
|
|
|
|
// became reachable, forward the key/value).
|
|
|
|
Base::StoreCompressedPointerNoBarrier(Types::GetWeakPropertyPtr(to),
|
|
|
|
OFFSET_OF(UntaggedWeakProperty, key_),
|
|
|
|
Object::null());
|
|
|
|
Base::StoreCompressedPointerNoBarrier(
|
|
|
|
Types::GetWeakPropertyPtr(to), OFFSET_OF(UntaggedWeakProperty, value_),
|
|
|
|
Object::null());
|
[vm] Implement `WeakReference` in the VM
This CL implements `WeakReference` in the VM.
* This reduces the size of weak references from 2 objects using 8 words
to 1 object using 4 words.
* This makes loads of weak reference targets a single load instead of
two.
* This avoids the fix-point in the GC and message object copying for
weak references. (N.b. Weak references need to be processed _after_
the fix-point for weak properties.)
The semantics of weak references in messages is that their target gets
set to `null` if the target is not included in the message by a strong
reference.
The tests take particular care to exercise the case where a weak
reference's target is only kept alive because a weak property key is
alive and it refers to the target in its value. This exercises the fact
that weak references need to be processed last.
Does not add support for weak references in the app snapshot. It would
be dead code until we start using weak references in for example the
CFE.
This CL does not try to unify weak references and weak properties in
the GC or messaging (as proposed in go/dart-vm-weakreference), because
their semantics differ enough.
Closes: https://github.com/dart-lang/sdk/issues/48162
TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart
TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart
TEST=runtime/vm/object_test.cc
TEST=tests/lib/isolate/weak_reference_message_1_test.dart
TEST=tests/lib/isolate/weak_reference_message_2_test.dart
Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c
Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
|
|
|
// To satisfy some ASSERT()s in GC we'll use Object:null() explicitly here.
|
|
|
|
Base::StoreCompressedPointerNoBarrier(
|
2022-03-23 19:26:11 +00:00
|
|
|
Types::GetWeakPropertyPtr(to),
|
|
|
|
OFFSET_OF(UntaggedWeakProperty, next_seen_by_gc_), Object::null());
|
2021-08-09 23:56:19 +00:00
|
|
|
Base::EnqueueWeakProperty(from);
|
|
|
|
}
|
|
|
|
|
[vm] Implement `WeakReference` in the VM
This CL implements `WeakReference` in the VM.
* This reduces the size of weak references from 2 objects using 8 words
to 1 object using 4 words.
* This makes loads of weak reference targets a single load instead of
two.
* This avoids the fix-point in the GC and message object copying for
weak references. (N.b. Weak references need to be processed _after_
the fix-point for weak properties.)
The semantics of weak references in messages is that their target gets
set to `null` if the target is not included in the message by a strong
reference.
The tests take particular care to exercise the case where a weak
reference's target is only kept alive because a weak property key is
alive and it refers to the target in its value. This exercises the fact
that weak references need to be processed last.
Does not add support for weak references in the app snapshot. It would
be dead code until we start using weak references in for example the
CFE.
This CL does not try to unify weak references and weak properties in
the GC or messaging (as proposed in go/dart-vm-weakreference), because
their semantics differ enough.
Closes: https://github.com/dart-lang/sdk/issues/48162
TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart
TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart
TEST=runtime/vm/object_test.cc
TEST=tests/lib/isolate/weak_reference_message_1_test.dart
TEST=tests/lib/isolate/weak_reference_message_2_test.dart
Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c
Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
|
|
|
void CopyWeakReference(typename Types::WeakReference from,
|
|
|
|
typename Types::WeakReference to) {
|
|
|
|
// We store `null` as target and let the main algorithm know that
|
|
|
|
// we should check reachability of the target again after the fixpoint (if
|
|
|
|
// it became reachable, forward the target).
|
|
|
|
Base::StoreCompressedPointerNoBarrier(
|
|
|
|
Types::GetWeakReferencePtr(to),
|
|
|
|
OFFSET_OF(UntaggedWeakReference, target_), Object::null());
|
|
|
|
// Type argument should always be copied.
|
|
|
|
Base::ForwardCompressedPointer(
|
|
|
|
from, to, OFFSET_OF(UntaggedWeakReference, type_arguments_));
|
|
|
|
// To satisfy some ASSERT()s in GC we'll use Object:null() explicitly here.
|
|
|
|
Base::StoreCompressedPointerNoBarrier(
|
2022-03-23 19:26:11 +00:00
|
|
|
Types::GetWeakReferencePtr(to),
|
|
|
|
OFFSET_OF(UntaggedWeakReference, next_seen_by_gc_), Object::null());
|
[vm] Implement `WeakReference` in the VM
This CL implements `WeakReference` in the VM.
* This reduces the size of weak references from 2 objects using 8 words
to 1 object using 4 words.
* This makes loads of weak reference targets a single load instead of
two.
* This avoids the fix-point in the GC and message object copying for
weak references. (N.b. Weak references need to be processed _after_
the fix-point for weak properties.)
The semantics of weak references in messages is that their target gets
set to `null` if the target is not included in the message by a strong
reference.
The tests take particular care to exercise the case where a weak
reference's target is only kept alive because a weak property key is
alive and it refers to the target in its value. This exercises the fact
that weak references need to be processed last.
Does not add support for weak references in the app snapshot. It would
be dead code until we start using weak references in for example the
CFE.
This CL does not try to unify weak references and weak properties in
the GC or messaging (as proposed in go/dart-vm-weakreference), because
their semantics differ enough.
Closes: https://github.com/dart-lang/sdk/issues/48162
TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart
TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart
TEST=runtime/vm/object_test.cc
TEST=tests/lib/isolate/weak_reference_message_1_test.dart
TEST=tests/lib/isolate/weak_reference_message_2_test.dart
Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c
Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
|
|
|
Base::EnqueueWeakReference(from);
|
|
|
|
}
|
|
|
|
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
#define DEFINE_UNSUPPORTED(clazz) \
|
|
|
|
void Copy##clazz(typename Types::clazz from, typename Types::clazz to) { \
|
|
|
|
FATAL("Objects of type " #clazz " should not occur in object graphs"); \
|
|
|
|
}
|
|
|
|
|
|
|
|
FOR_UNSUPPORTED_CLASSES(DEFINE_UNSUPPORTED)
|
|
|
|
|
|
|
|
#undef DEFINE_UNSUPPORTED
|
|
|
|
|
|
|
|
UntaggedObject* UntagObject(typename Types::Object obj) {
|
|
|
|
return Types::GetObjectPtr(obj).Decompress(Base::heap_base_).untag();
|
|
|
|
}
|
|
|
|
|
|
|
|
#define DO(V) \
|
|
|
|
DART_FORCE_INLINE \
|
|
|
|
Untagged##V* Untag##V(typename Types::V obj) { \
|
|
|
|
return Types::Get##V##Ptr(obj).Decompress(Base::heap_base_).untag(); \
|
|
|
|
}
|
|
|
|
CLASS_LIST_FOR_HANDLES(DO)
|
|
|
|
#undef DO
|
|
|
|
};
|
|
|
|
|
|
|
|
class FastObjectCopy : public ObjectCopy<FastObjectCopyBase> {
|
|
|
|
public:
|
|
|
|
explicit FastObjectCopy(Thread* thread) : ObjectCopy(thread) {}
|
|
|
|
~FastObjectCopy() {}
|
|
|
|
|
|
|
|
ObjectPtr TryCopyGraphFast(ObjectPtr root) {
|
|
|
|
NoSafepointScope no_safepoint_scope;
|
|
|
|
|
|
|
|
ObjectPtr root_copy = Forward(TagsFromUntaggedObject(root.untag()), root);
|
|
|
|
if (root_copy == Marker()) {
|
|
|
|
return root_copy;
|
|
|
|
}
|
2021-08-09 23:56:19 +00:00
|
|
|
auto& from_weak_property = WeakProperty::Handle(zone_);
|
|
|
|
auto& to_weak_property = WeakProperty::Handle(zone_);
|
|
|
|
auto& weak_property_key = Object::Handle(zone_);
|
|
|
|
while (true) {
|
|
|
|
if (fast_forward_map_.fill_cursor_ ==
|
|
|
|
fast_forward_map_.raw_from_to_.length()) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Run fixpoint to copy all objects.
|
|
|
|
while (fast_forward_map_.fill_cursor_ <
|
|
|
|
fast_forward_map_.raw_from_to_.length()) {
|
|
|
|
const intptr_t index = fast_forward_map_.fill_cursor_;
|
|
|
|
ObjectPtr from = fast_forward_map_.raw_from_to_[index];
|
|
|
|
ObjectPtr to = fast_forward_map_.raw_from_to_[index + 1];
|
|
|
|
FastCopyObject(from, to);
|
|
|
|
if (exception_msg_ != nullptr) {
|
|
|
|
return root_copy;
|
|
|
|
}
|
|
|
|
fast_forward_map_.fill_cursor_ += 2;
|
2022-05-19 12:22:24 +00:00
|
|
|
|
|
|
|
// To maintain responsiveness we regularly check whether safepoints are
|
|
|
|
// requested - if so, we bail to slow path which will then checkin.
|
|
|
|
if (thread_->IsSafepointRequested()) {
|
|
|
|
exception_msg_ = kFastAllocationFailed;
|
|
|
|
return root_copy;
|
|
|
|
}
|
2021-08-09 23:56:19 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// Possibly forward values of [WeakProperty]s if keys became reachable.
|
|
|
|
intptr_t i = 0;
|
|
|
|
auto& weak_properties = fast_forward_map_.raw_weak_properties_;
|
|
|
|
while (i < weak_properties.length()) {
|
|
|
|
from_weak_property = weak_properties[i];
|
|
|
|
weak_property_key =
|
|
|
|
fast_forward_map_.ForwardedObject(from_weak_property.key());
|
|
|
|
if (weak_property_key.ptr() != Marker()) {
|
|
|
|
to_weak_property ^=
|
|
|
|
fast_forward_map_.ForwardedObject(from_weak_property.ptr());
|
|
|
|
|
|
|
|
// The key became reachable so we'll change the forwarded
|
|
|
|
// [WeakProperty]'s key to the new key (it is `null` at this point).
|
|
|
|
to_weak_property.set_key(weak_property_key);
|
|
|
|
|
|
|
|
// Since the key has become strongly reachable in the copied graph,
|
|
|
|
// we'll also need to forward the value.
|
|
|
|
ForwardCompressedPointer(from_weak_property.ptr(),
|
|
|
|
to_weak_property.ptr(),
|
|
|
|
OFFSET_OF(UntaggedWeakProperty, value_));
|
|
|
|
|
|
|
|
// We don't need to process this [WeakProperty] again.
|
|
|
|
const intptr_t last = weak_properties.length() - 1;
|
|
|
|
if (i < last) {
|
|
|
|
weak_properties[i] = weak_properties[last];
|
|
|
|
weak_properties.SetLength(last);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
i++;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
|
|
|
}
|
[vm] Implement `WeakReference` in the VM
This CL implements `WeakReference` in the VM.
* This reduces the size of weak references from 2 objects using 8 words
to 1 object using 4 words.
* This makes loads of weak reference targets a single load instead of
two.
* This avoids the fix-point in the GC and message object copying for
weak references. (N.b. Weak references need to be processed _after_
the fix-point for weak properties.)
The semantics of weak references in messages is that their target gets
set to `null` if the target is not included in the message by a strong
reference.
The tests take particular care to exercise the case where a weak
reference's target is only kept alive because a weak property key is
alive and it refers to the target in its value. This exercises the fact
that weak references need to be processed last.
Does not add support for weak references in the app snapshot. It would
be dead code until we start using weak references in for example the
CFE.
This CL does not try to unify weak references and weak properties in
the GC or messaging (as proposed in go/dart-vm-weakreference), because
their semantics differ enough.
Closes: https://github.com/dart-lang/sdk/issues/48162
TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart
TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart
TEST=runtime/vm/object_test.cc
TEST=tests/lib/isolate/weak_reference_message_1_test.dart
TEST=tests/lib/isolate/weak_reference_message_2_test.dart
Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c
Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
|
|
|
// After the fix point with [WeakProperty]s do [WeakReference]s.
|
|
|
|
auto& from_weak_reference = WeakReference::Handle(zone_);
|
|
|
|
auto& to_weak_reference = WeakReference::Handle(zone_);
|
|
|
|
auto& weak_reference_target = Object::Handle(zone_);
|
|
|
|
auto& weak_references = fast_forward_map_.raw_weak_references_;
|
|
|
|
for (intptr_t i = 0; i < weak_references.length(); i++) {
|
|
|
|
from_weak_reference = weak_references[i];
|
|
|
|
weak_reference_target =
|
|
|
|
fast_forward_map_.ForwardedObject(from_weak_reference.target());
|
|
|
|
if (weak_reference_target.ptr() != Marker()) {
|
|
|
|
to_weak_reference ^=
|
|
|
|
fast_forward_map_.ForwardedObject(from_weak_reference.ptr());
|
|
|
|
|
|
|
|
// The target became reachable so we'll change the forwarded
|
|
|
|
// [WeakReference]'s target to the new target (it is `null` at this
|
|
|
|
// point).
|
|
|
|
to_weak_reference.set_target(weak_reference_target);
|
|
|
|
}
|
|
|
|
}
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
if (root_copy != Marker()) {
|
2021-08-09 23:56:19 +00:00
|
|
|
ObjectPtr array;
|
|
|
|
array = TryBuildArrayOfObjectsToRehash(
|
|
|
|
fast_forward_map_.raw_objects_to_rehash_);
|
|
|
|
if (array == Marker()) return root_copy;
|
|
|
|
raw_objects_to_rehash_ = Array::RawCast(array);
|
|
|
|
|
|
|
|
array = TryBuildArrayOfObjectsToRehash(
|
|
|
|
fast_forward_map_.raw_expandos_to_rehash_);
|
|
|
|
if (array == Marker()) return root_copy;
|
|
|
|
raw_expandos_to_rehash_ = Array::RawCast(array);
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
|
|
|
return root_copy;
|
|
|
|
}
|
|
|
|
|
2021-08-09 23:56:19 +00:00
|
|
|
ObjectPtr TryBuildArrayOfObjectsToRehash(
|
|
|
|
const GrowableArray<ObjectPtr>& objects_to_rehash) {
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
const intptr_t length = objects_to_rehash.length();
|
2021-08-09 23:56:19 +00:00
|
|
|
if (length == 0) return Object::null();
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
|
|
|
|
const intptr_t size = Array::InstanceSize(length);
|
2022-06-07 21:56:41 +00:00
|
|
|
const uword array_addr = new_space_->TryAllocateNoSafepoint(thread_, size);
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
if (array_addr == 0) {
|
|
|
|
exception_msg_ = kFastAllocationFailed;
|
2021-08-09 23:56:19 +00:00
|
|
|
return Marker();
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
const uword header_size =
|
|
|
|
UntaggedObject::SizeTag::SizeFits(size) ? size : 0;
|
|
|
|
ArrayPtr array(reinterpret_cast<UntaggedArray*>(array_addr));
|
|
|
|
SetNewSpaceTaggingWord(array, kArrayCid, header_size);
|
|
|
|
StoreCompressedPointerNoBarrier(array, OFFSET_OF(UntaggedArray, length_),
|
|
|
|
Smi::New(length));
|
|
|
|
StoreCompressedPointerNoBarrier(array,
|
|
|
|
OFFSET_OF(UntaggedArray, type_arguments_),
|
|
|
|
TypeArguments::null());
|
|
|
|
auto array_data = array.untag()->data();
|
|
|
|
for (intptr_t i = 0; i < length; ++i) {
|
|
|
|
array_data[i] = objects_to_rehash[i];
|
|
|
|
}
|
2021-08-09 23:56:19 +00:00
|
|
|
return array;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
private:
|
|
|
|
friend class ObjectGraphCopier;
|
|
|
|
|
|
|
|
void FastCopyObject(ObjectPtr from, ObjectPtr to) {
|
|
|
|
const uword tags = TagsFromUntaggedObject(from.untag());
|
|
|
|
const intptr_t cid = UntaggedObject::ClassIdTag::decode(tags);
|
|
|
|
const intptr_t size = UntaggedObject::SizeTag::decode(tags);
|
|
|
|
|
|
|
|
// Ensure the last word is GC-safe (our heap objects are 2-word aligned, the
|
|
|
|
// object header stores the size in multiples of kObjectAlignment, the GC
|
|
|
|
// uses the information from the header and therefore might visit one slot
|
|
|
|
// more than the actual size of the instance).
|
|
|
|
*reinterpret_cast<ObjectPtr*>(UntaggedObject::ToAddr(to) +
|
|
|
|
from.untag()->HeapSize() - kWordSize) = 0;
|
|
|
|
SetNewSpaceTaggingWord(to, cid, size);
|
|
|
|
|
|
|
|
// Fall back to virtual variant for predefined classes
|
|
|
|
if (cid < kNumPredefinedCids && cid != kInstanceCid) {
|
|
|
|
CopyPredefinedInstance(from, to, cid);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
#if defined(DART_PRECOMPILED_RUNTIME)
|
|
|
|
const auto bitmap =
|
|
|
|
class_table_->shared_class_table()->GetUnboxedFieldsMapAt(cid);
|
|
|
|
CopyUserdefinedInstanceAOT(Instance::RawCast(from), Instance::RawCast(to),
|
|
|
|
bitmap);
|
|
|
|
#else
|
|
|
|
CopyUserdefinedInstance(Instance::RawCast(from), Instance::RawCast(to));
|
|
|
|
#endif
|
2021-08-09 23:56:19 +00:00
|
|
|
if (cid == expando_cid_) {
|
|
|
|
EnqueueExpandoToRehash(to);
|
|
|
|
}
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
ArrayPtr raw_objects_to_rehash_ = Array::null();
|
2021-08-09 23:56:19 +00:00
|
|
|
ArrayPtr raw_expandos_to_rehash_ = Array::null();
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
class SlowObjectCopy : public ObjectCopy<SlowObjectCopyBase> {
|
|
|
|
public:
|
|
|
|
explicit SlowObjectCopy(Thread* thread)
|
2021-08-09 23:56:19 +00:00
|
|
|
: ObjectCopy(thread),
|
|
|
|
objects_to_rehash_(Array::Handle(thread->zone())),
|
|
|
|
expandos_to_rehash_(Array::Handle(thread->zone())) {}
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
~SlowObjectCopy() {}
|
|
|
|
|
|
|
|
ObjectPtr ContinueCopyGraphSlow(const Object& root,
|
|
|
|
const Object& fast_root_copy) {
|
|
|
|
auto& root_copy = Object::Handle(Z, fast_root_copy.ptr());
|
|
|
|
if (root_copy.ptr() == Marker()) {
|
|
|
|
root_copy = Forward(TagsFromUntaggedObject(root.ptr().untag()), root);
|
|
|
|
}
|
|
|
|
|
2021-08-09 23:56:19 +00:00
|
|
|
WeakProperty& weak_property = WeakProperty::Handle(Z);
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
Object& from = Object::Handle(Z);
|
|
|
|
Object& to = Object::Handle(Z);
|
2021-08-09 23:56:19 +00:00
|
|
|
while (true) {
|
|
|
|
if (slow_forward_map_.fill_cursor_ ==
|
2022-06-01 17:52:28 +00:00
|
|
|
slow_forward_map_.from_to_.Length()) {
|
2021-08-09 23:56:19 +00:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Run fixpoint to copy all objects.
|
|
|
|
while (slow_forward_map_.fill_cursor_ <
|
2022-06-01 17:52:28 +00:00
|
|
|
slow_forward_map_.from_to_.Length()) {
|
2021-08-09 23:56:19 +00:00
|
|
|
const intptr_t index = slow_forward_map_.fill_cursor_;
|
2022-06-01 17:52:28 +00:00
|
|
|
from = slow_forward_map_.from_to_.At(index);
|
|
|
|
to = slow_forward_map_.from_to_.At(index + 1);
|
2021-08-09 23:56:19 +00:00
|
|
|
CopyObject(from, to);
|
|
|
|
slow_forward_map_.fill_cursor_ += 2;
|
|
|
|
if (exception_msg_ != nullptr) {
|
|
|
|
return Marker();
|
|
|
|
}
|
2022-05-19 12:22:24 +00:00
|
|
|
// To maintain responsiveness we regularly check whether safepoints are
|
|
|
|
// requested.
|
|
|
|
thread_->CheckForSafepoint();
|
2021-08-09 23:56:19 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// Possibly forward values of [WeakProperty]s if keys became reachable.
|
|
|
|
intptr_t i = 0;
|
|
|
|
auto& weak_properties = slow_forward_map_.weak_properties_;
|
|
|
|
while (i < weak_properties.length()) {
|
|
|
|
const auto& from_weak_property = *weak_properties[i];
|
|
|
|
to = slow_forward_map_.ForwardedObject(from_weak_property.key());
|
|
|
|
if (to.ptr() != Marker()) {
|
|
|
|
weak_property ^=
|
|
|
|
slow_forward_map_.ForwardedObject(from_weak_property.ptr());
|
|
|
|
|
|
|
|
// The key became reachable so we'll change the forwarded
|
|
|
|
// [WeakProperty]'s key to the new key (it is `null` at this point).
|
|
|
|
weak_property.set_key(to);
|
|
|
|
|
|
|
|
// Since the key has become strongly reachable in the copied graph,
|
|
|
|
// we'll also need to forward the value.
|
|
|
|
ForwardCompressedPointer(from_weak_property, weak_property,
|
|
|
|
OFFSET_OF(UntaggedWeakProperty, value_));
|
|
|
|
|
|
|
|
// We don't need to process this [WeakProperty] again.
|
|
|
|
const intptr_t last = weak_properties.length() - 1;
|
|
|
|
if (i < last) {
|
|
|
|
weak_properties[i] = weak_properties[last];
|
|
|
|
weak_properties.SetLength(last);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
i++;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
|
|
|
}
|
2021-08-09 23:56:19 +00:00
|
|
|
|
[vm] Implement `WeakReference` in the VM
This CL implements `WeakReference` in the VM.
* This reduces the size of weak references from 2 objects using 8 words
to 1 object using 4 words.
* This makes loads of weak reference targets a single load instead of
two.
* This avoids the fix-point in the GC and message object copying for
weak references. (N.b. Weak references need to be processed _after_
the fix-point for weak properties.)
The semantics of weak references in messages is that their target gets
set to `null` if the target is not included in the message by a strong
reference.
The tests take particular care to exercise the case where a weak
reference's target is only kept alive because a weak property key is
alive and it refers to the target in its value. This exercises the fact
that weak references need to be processed last.
Does not add support for weak references in the app snapshot. It would
be dead code until we start using weak references in for example the
CFE.
This CL does not try to unify weak references and weak properties in
the GC or messaging (as proposed in go/dart-vm-weakreference), because
their semantics differ enough.
Closes: https://github.com/dart-lang/sdk/issues/48162
TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart
TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart
TEST=runtime/vm/object_test.cc
TEST=tests/lib/isolate/weak_reference_message_1_test.dart
TEST=tests/lib/isolate/weak_reference_message_2_test.dart
Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c
Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
|
|
|
// After the fix point with [WeakProperty]s do [WeakReference]s.
|
|
|
|
WeakReference& weak_reference = WeakReference::Handle(Z);
|
|
|
|
auto& weak_references = slow_forward_map_.weak_references_;
|
|
|
|
for (intptr_t i = 0; i < weak_references.length(); i++) {
|
|
|
|
const auto& from_weak_reference = *weak_references[i];
|
|
|
|
to = slow_forward_map_.ForwardedObject(from_weak_reference.target());
|
|
|
|
if (to.ptr() != Marker()) {
|
|
|
|
weak_reference ^=
|
|
|
|
slow_forward_map_.ForwardedObject(from_weak_reference.ptr());
|
|
|
|
|
|
|
|
// The target became reachable so we'll change the forwarded
|
|
|
|
// [WeakReference]'s target to the new target (it is `null` at this
|
|
|
|
// point).
|
|
|
|
weak_reference.set_target(to);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-08-09 23:56:19 +00:00
|
|
|
objects_to_rehash_ =
|
|
|
|
BuildArrayOfObjectsToRehash(slow_forward_map_.objects_to_rehash_);
|
|
|
|
expandos_to_rehash_ =
|
|
|
|
BuildArrayOfObjectsToRehash(slow_forward_map_.expandos_to_rehash_);
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
return root_copy.ptr();
|
|
|
|
}
|
|
|
|
|
2021-08-09 23:56:19 +00:00
|
|
|
ArrayPtr BuildArrayOfObjectsToRehash(
|
|
|
|
const GrowableArray<const Object*>& objects_to_rehash) {
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
const intptr_t length = objects_to_rehash.length();
|
2021-08-09 23:56:19 +00:00
|
|
|
if (length == 0) return Array::null();
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
|
2021-08-09 23:56:19 +00:00
|
|
|
const auto& array = Array::Handle(zone_, Array::New(length));
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
for (intptr_t i = 0; i < length; ++i) {
|
2021-08-09 23:56:19 +00:00
|
|
|
array.SetAt(i, *objects_to_rehash[i]);
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
2021-08-09 23:56:19 +00:00
|
|
|
return array.ptr();
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
private:
|
|
|
|
friend class ObjectGraphCopier;
|
|
|
|
|
|
|
|
void CopyObject(const Object& from, const Object& to) {
|
|
|
|
const auto cid = from.GetClassId();
|
|
|
|
|
|
|
|
// Fall back to virtual variant for predefined classes
|
|
|
|
if (cid < kNumPredefinedCids && cid != kInstanceCid) {
|
|
|
|
CopyPredefinedInstance(from, to, cid);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
#if defined(DART_PRECOMPILED_RUNTIME)
|
|
|
|
const auto bitmap =
|
|
|
|
class_table_->shared_class_table()->GetUnboxedFieldsMapAt(cid);
|
|
|
|
CopyUserdefinedInstanceAOT(from, to, bitmap);
|
|
|
|
#else
|
|
|
|
CopyUserdefinedInstance(from, to);
|
|
|
|
#endif
|
2021-08-09 23:56:19 +00:00
|
|
|
if (cid == expando_cid_) {
|
|
|
|
EnqueueExpandoToRehash(to);
|
|
|
|
}
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
Array& objects_to_rehash_;
|
2021-08-09 23:56:19 +00:00
|
|
|
Array& expandos_to_rehash_;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
class ObjectGraphCopier {
|
|
|
|
public:
|
|
|
|
explicit ObjectGraphCopier(Thread* thread)
|
|
|
|
: thread_(thread),
|
|
|
|
zone_(thread->zone()),
|
|
|
|
fast_object_copy_(thread_),
|
|
|
|
slow_object_copy_(thread_) {
|
|
|
|
thread_->isolate()->set_forward_table_new(new WeakTable());
|
|
|
|
thread_->isolate()->set_forward_table_old(new WeakTable());
|
|
|
|
}
|
|
|
|
~ObjectGraphCopier() {
|
|
|
|
thread_->isolate()->set_forward_table_new(nullptr);
|
|
|
|
thread_->isolate()->set_forward_table_old(nullptr);
|
|
|
|
}
|
|
|
|
|
2021-09-03 19:08:37 +00:00
|
|
|
// Result will be
|
|
|
|
// [
|
|
|
|
// <message>,
|
|
|
|
// <collection-lib-objects-to-rehash>,
|
|
|
|
// <core-lib-objects-to-rehash>,
|
|
|
|
// ]
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
ObjectPtr CopyObjectGraph(const Object& root) {
|
2021-07-28 19:57:54 +00:00
|
|
|
const char* volatile exception_msg = nullptr;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
auto& result = Object::Handle(zone_);
|
|
|
|
|
|
|
|
{
|
|
|
|
LongJumpScope jump; // e.g. for OOMs.
|
|
|
|
if (setjmp(*jump.Set()) == 0) {
|
|
|
|
result = CopyObjectGraphInternal(root, &exception_msg);
|
|
|
|
// Any allocated external typed data must have finalizers attached so
|
|
|
|
// memory will get free()ed.
|
|
|
|
slow_object_copy_.slow_forward_map_.FinalizeExternalTypedData();
|
|
|
|
} else {
|
|
|
|
// Any allocated external typed data must have finalizers attached so
|
|
|
|
// memory will get free()ed.
|
|
|
|
slow_object_copy_.slow_forward_map_.FinalizeExternalTypedData();
|
|
|
|
|
|
|
|
// The copy failed due to non-application error (e.g. OOM error),
|
|
|
|
// propagate this error.
|
|
|
|
result = thread_->StealStickyError();
|
|
|
|
RELEASE_ASSERT(result.IsError());
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (result.IsError()) {
|
|
|
|
Exceptions::PropagateError(Error::Cast(result));
|
|
|
|
UNREACHABLE();
|
|
|
|
}
|
|
|
|
if (result.ptr() == Marker()) {
|
|
|
|
ASSERT(exception_msg != nullptr);
|
|
|
|
ThrowException(exception_msg);
|
|
|
|
UNREACHABLE();
|
|
|
|
}
|
|
|
|
|
|
|
|
// The copy was successful, then detach transferable data from the sender
|
|
|
|
// and attach to the copied graph.
|
|
|
|
slow_object_copy_.slow_forward_map_.FinalizeTransferables();
|
|
|
|
return result.ptr();
|
|
|
|
}
|
|
|
|
|
2022-05-20 08:16:41 +00:00
|
|
|
intptr_t allocated_bytes() { return allocated_bytes_; }
|
|
|
|
|
|
|
|
intptr_t copied_objects() { return copied_objects_; }
|
|
|
|
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
private:
|
|
|
|
ObjectPtr CopyObjectGraphInternal(const Object& root,
|
2021-07-28 19:57:54 +00:00
|
|
|
const char* volatile* exception_msg) {
|
2021-08-09 23:56:19 +00:00
|
|
|
const auto& result_array = Array::Handle(zone_, Array::New(3));
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
if (!root.ptr()->IsHeapObject()) {
|
|
|
|
result_array.SetAt(0, root);
|
|
|
|
return result_array.ptr();
|
|
|
|
}
|
|
|
|
const uword tags = TagsFromUntaggedObject(root.ptr().untag());
|
2021-09-02 19:45:55 +00:00
|
|
|
if (CanShareObject(root.ptr(), tags)) {
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
result_array.SetAt(0, root);
|
|
|
|
return result_array.ptr();
|
|
|
|
}
|
|
|
|
if (!fast_object_copy_.CanCopyObject(tags, root.ptr())) {
|
|
|
|
ASSERT(fast_object_copy_.exception_msg_ != nullptr);
|
|
|
|
*exception_msg = fast_object_copy_.exception_msg_;
|
|
|
|
return Marker();
|
|
|
|
}
|
|
|
|
|
|
|
|
// We try a fast new-space only copy first that will not use any barriers.
|
|
|
|
auto& result = Object::Handle(Z, Marker());
|
|
|
|
|
|
|
|
// All allocated but non-initialized heap objects have to be made GC-visible
|
|
|
|
// at this point.
|
|
|
|
if (FLAG_enable_fast_object_copy) {
|
|
|
|
{
|
|
|
|
NoSafepointScope no_safepoint_scope;
|
|
|
|
|
|
|
|
result = fast_object_copy_.TryCopyGraphFast(root.ptr());
|
|
|
|
if (result.ptr() != Marker()) {
|
|
|
|
if (fast_object_copy_.exception_msg_ == nullptr) {
|
|
|
|
result_array.SetAt(0, result);
|
|
|
|
fast_object_copy_.tmp_ = fast_object_copy_.raw_objects_to_rehash_;
|
|
|
|
result_array.SetAt(1, fast_object_copy_.tmp_);
|
2021-08-09 23:56:19 +00:00
|
|
|
fast_object_copy_.tmp_ = fast_object_copy_.raw_expandos_to_rehash_;
|
|
|
|
result_array.SetAt(2, fast_object_copy_.tmp_);
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
HandlifyExternalTypedData();
|
|
|
|
HandlifyTransferables();
|
2022-05-20 08:16:41 +00:00
|
|
|
allocated_bytes_ =
|
|
|
|
fast_object_copy_.fast_forward_map_.allocated_bytes;
|
|
|
|
copied_objects_ =
|
|
|
|
fast_object_copy_.fast_forward_map_.fill_cursor_ / 2 -
|
|
|
|
/*null_entry=*/1;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
return result_array.ptr();
|
|
|
|
}
|
|
|
|
|
|
|
|
// There are left-over uninitialized objects we'll have to make GC
|
|
|
|
// visible.
|
|
|
|
SwitchToSlowFowardingList();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (FLAG_gc_on_foc_slow_path) {
|
2022-03-08 22:42:46 +00:00
|
|
|
// We force the GC to compact, which is more likely to discover
|
|
|
|
// untracked pointers (and other issues, like incorrect class table).
|
|
|
|
thread_->heap()->CollectAllGarbage(GCReason::kDebugging,
|
Reland "[vm] Implement `Finalizer`"
Original CL in patchset 1.
Split-off https://dart-review.googlesource.com/c/sdk/+/238341
And pulled in fix https://dart-review.googlesource.com/c/sdk/+/238582
(Should merge cleanly when this lands later.)
This CL implements the `Finalizer` in the GC.
The GC is specially aware of two types of objects for the purposes of
running finalizers.
1) `FinalizerEntry`
2) `Finalizer` (`FinalizerBase`, `_FinalizerImpl`)
A `FinalizerEntry` contains the `value`, the optional `detach` key, and
the `token`, and a reference to the `finalizer`.
An entry only holds on weakly to the value, detach key, and finalizer.
(Similar to how `WeakReference` only holds on weakly to target).
A `Finalizer` contains all entries, a list of entries of which the value
is collected, and a reference to the isolate.
When a the value of an entry is GCed, the enry is added over to the
collected list.
If any entry is moved to the collected list, a message is sent that
invokes the finalizer to call the callback on all entries in that list.
When a finalizer is detached by the user, the entry token is set to the
entry itself and is removed from the all entries set.
This ensures that if the entry was already moved to the collected list,
the finalizer is not executed.
To speed up detaching, we use a weak map from detach keys to list of
entries. This ensures entries can be GCed.
Both the scavenger and marker tasks process finalizer entries in
parallel.
Parallel tasks use an atomic exchange on the head of the collected
entries list, ensuring no entries get lost.
The mutator thread is guaranteed to be stopped when processing entries.
This ensures that we do not need barriers for moving entries into the
finalizers collected list.
Dart reads and replaces the collected entries list also with an atomic
exchange, ensuring the GC doesn't run in between a load/store.
When a finalizer gets posted a message to process finalized objects, it
is being kept alive by the message.
An alternative design would be to pre-allocate a `WeakReference` in the
finalizer pointing to the finalizer, and send that itself.
This would be at the cost of an extra object.
Send and exit is not supported in this CL, support will be added in a
follow up CL. Trying to send will throw.
Bug: https://github.com/dart-lang/sdk/issues/47777
TEST=runtime/tests/vm/dart/finalizer/*
TEST=runtime/tests/vm/dart_2/isolates/fast_object_copy_test.dart
TEST=runtime/vm/object_test.cc
Change-Id: Ibdfeadc16d5d69ade50aae5b9f794284c4c4dbab
Cq-Include-Trybots: luci.dart.try:vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-ffi-android-debug-arm64c-try,dart-sdk-mac-arm64-try,vm-kernel-mac-release-arm64-try,pkg-mac-release-arm64-try,vm-kernel-precomp-nnbd-mac-release-arm64-try,vm-kernel-win-debug-x64c-try,vm-kernel-win-debug-x64-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-nnbd-win-release-ia32-try,vm-ffi-android-debug-arm-try,vm-precomp-ffi-qemu-linux-release-arm-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-ia32-try,benchmark-linux-try,flutter-analyze-try,flutter-frontend-try,pkg-linux-debug-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-linux-debug-simarm_x64-try,vm-kernel-precomp-obfuscate-linux-release-x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/238086
Reviewed-by: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-03-25 10:29:30 +00:00
|
|
|
/*compact=*/true);
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
|
|
|
|
2022-06-01 17:52:28 +00:00
|
|
|
ObjectifyFromToObjects();
|
|
|
|
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
// Fast copy failed due to
|
|
|
|
// - either failure to allocate into new space
|
|
|
|
// - or failure to copy object which we cannot copy
|
|
|
|
ASSERT(fast_object_copy_.exception_msg_ != nullptr);
|
|
|
|
if (fast_object_copy_.exception_msg_ != kFastAllocationFailed) {
|
|
|
|
*exception_msg = fast_object_copy_.exception_msg_;
|
|
|
|
return Marker();
|
|
|
|
}
|
|
|
|
ASSERT(fast_object_copy_.exception_msg_ == kFastAllocationFailed);
|
|
|
|
}
|
|
|
|
|
|
|
|
// Use the slow copy approach.
|
|
|
|
result = slow_object_copy_.ContinueCopyGraphSlow(root, result);
|
|
|
|
ASSERT((result.ptr() == Marker()) ==
|
|
|
|
(slow_object_copy_.exception_msg_ != nullptr));
|
|
|
|
if (result.ptr() == Marker()) {
|
|
|
|
*exception_msg = slow_object_copy_.exception_msg_;
|
|
|
|
return Marker();
|
|
|
|
}
|
|
|
|
|
|
|
|
result_array.SetAt(0, result);
|
|
|
|
result_array.SetAt(1, slow_object_copy_.objects_to_rehash_);
|
2021-08-09 23:56:19 +00:00
|
|
|
result_array.SetAt(2, slow_object_copy_.expandos_to_rehash_);
|
2022-05-20 08:16:41 +00:00
|
|
|
allocated_bytes_ = slow_object_copy_.slow_forward_map_.allocated_bytes;
|
|
|
|
copied_objects_ =
|
|
|
|
slow_object_copy_.slow_forward_map_.fill_cursor_ / 2 - /*null_entry=*/1;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
return result_array.ptr();
|
|
|
|
}
|
|
|
|
|
|
|
|
void SwitchToSlowFowardingList() {
|
|
|
|
auto& fast_forward_map = fast_object_copy_.fast_forward_map_;
|
|
|
|
auto& slow_forward_map = slow_object_copy_.slow_forward_map_;
|
|
|
|
|
|
|
|
MakeUninitializedNewSpaceObjectsGCSafe();
|
|
|
|
HandlifyTransferables();
|
2021-08-09 23:56:19 +00:00
|
|
|
HandlifyWeakProperties();
|
[vm] Implement `WeakReference` in the VM
This CL implements `WeakReference` in the VM.
* This reduces the size of weak references from 2 objects using 8 words
to 1 object using 4 words.
* This makes loads of weak reference targets a single load instead of
two.
* This avoids the fix-point in the GC and message object copying for
weak references. (N.b. Weak references need to be processed _after_
the fix-point for weak properties.)
The semantics of weak references in messages is that their target gets
set to `null` if the target is not included in the message by a strong
reference.
The tests take particular care to exercise the case where a weak
reference's target is only kept alive because a weak property key is
alive and it refers to the target in its value. This exercises the fact
that weak references need to be processed last.
Does not add support for weak references in the app snapshot. It would
be dead code until we start using weak references in for example the
CFE.
This CL does not try to unify weak references and weak properties in
the GC or messaging (as proposed in go/dart-vm-weakreference), because
their semantics differ enough.
Closes: https://github.com/dart-lang/sdk/issues/48162
TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart
TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart
TEST=runtime/vm/object_test.cc
TEST=tests/lib/isolate/weak_reference_message_1_test.dart
TEST=tests/lib/isolate/weak_reference_message_2_test.dart
Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c
Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
|
|
|
HandlifyWeakReferences();
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
HandlifyExternalTypedData();
|
|
|
|
HandlifyObjectsToReHash();
|
2021-08-09 23:56:19 +00:00
|
|
|
HandlifyExpandosToReHash();
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
HandlifyFromToObjects();
|
|
|
|
slow_forward_map.fill_cursor_ = fast_forward_map.fill_cursor_;
|
2022-05-20 08:16:41 +00:00
|
|
|
slow_forward_map.allocated_bytes = fast_forward_map.allocated_bytes;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void MakeUninitializedNewSpaceObjectsGCSafe() {
|
|
|
|
auto& fast_forward_map = fast_object_copy_.fast_forward_map_;
|
|
|
|
const auto length = fast_forward_map.raw_from_to_.length();
|
|
|
|
const auto cursor = fast_forward_map.fill_cursor_;
|
|
|
|
for (intptr_t i = cursor; i < length; i += 2) {
|
|
|
|
auto from = fast_forward_map.raw_from_to_[i];
|
|
|
|
auto to = fast_forward_map.raw_from_to_[i + 1];
|
|
|
|
const uword tags = TagsFromUntaggedObject(from.untag());
|
|
|
|
const intptr_t cid = UntaggedObject::ClassIdTag::decode(tags);
|
|
|
|
// External typed data is already initialized.
|
|
|
|
if (!IsExternalTypedDataClassId(cid) && !IsTypedDataViewClassId(cid)) {
|
2021-09-16 22:17:24 +00:00
|
|
|
#if defined(DART_COMPRESSED_POINTERS)
|
|
|
|
const bool compressed = true;
|
|
|
|
#else
|
|
|
|
const bool compressed = false;
|
|
|
|
#endif
|
2021-09-07 14:16:52 +00:00
|
|
|
Object::InitializeObject(reinterpret_cast<uword>(to.untag()), cid,
|
2021-09-16 22:17:24 +00:00
|
|
|
from.untag()->HeapSize(), compressed);
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
UpdateLengthField(cid, from, to);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
void HandlifyTransferables() {
|
2021-08-09 23:56:19 +00:00
|
|
|
Handlify(&fast_object_copy_.fast_forward_map_.raw_transferables_from_to_,
|
|
|
|
&slow_object_copy_.slow_forward_map_.transferables_from_to_);
|
|
|
|
}
|
|
|
|
void HandlifyWeakProperties() {
|
|
|
|
Handlify(&fast_object_copy_.fast_forward_map_.raw_weak_properties_,
|
|
|
|
&slow_object_copy_.slow_forward_map_.weak_properties_);
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
[vm] Implement `WeakReference` in the VM
This CL implements `WeakReference` in the VM.
* This reduces the size of weak references from 2 objects using 8 words
to 1 object using 4 words.
* This makes loads of weak reference targets a single load instead of
two.
* This avoids the fix-point in the GC and message object copying for
weak references. (N.b. Weak references need to be processed _after_
the fix-point for weak properties.)
The semantics of weak references in messages is that their target gets
set to `null` if the target is not included in the message by a strong
reference.
The tests take particular care to exercise the case where a weak
reference's target is only kept alive because a weak property key is
alive and it refers to the target in its value. This exercises the fact
that weak references need to be processed last.
Does not add support for weak references in the app snapshot. It would
be dead code until we start using weak references in for example the
CFE.
This CL does not try to unify weak references and weak properties in
the GC or messaging (as proposed in go/dart-vm-weakreference), because
their semantics differ enough.
Closes: https://github.com/dart-lang/sdk/issues/48162
TEST=runtime/tests/vm/dart/finalizer/weak_reference_run_gc_test.dart
TEST=runtime/tests/vm/dart/isolates/fast_object_copy_test.dart
TEST=runtime/vm/object_test.cc
TEST=tests/lib/isolate/weak_reference_message_1_test.dart
TEST=tests/lib/isolate/weak_reference_message_2_test.dart
Change-Id: I3810e919a5866f3ae8a95bd9aa23a880a0b0921c
Cq-Include-Trybots: luci.dart.try:app-kernel-linux-debug-x64-try,dart-sdk-mac-arm64-try,vm-canary-linux-debug-try,vm-fuchsia-release-x64-try,vm-kernel-gcc-linux-try,vm-kernel-asan-linux-release-x64-try,vm-kernel-linux-debug-x64c-try,vm-kernel-linux-debug-x64-try,vm-kernel-linux-debug-simriscv64-try,vm-kernel-mac-debug-x64-try,vm-kernel-nnbd-linux-debug-x64-try,vm-kernel-nnbd-linux-release-ia32-try,vm-kernel-nnbd-linux-release-simarm64-try,vm-kernel-nnbd-linux-release-simarm-try,vm-kernel-nnbd-mac-debug-arm64-try,vm-kernel-nnbd-mac-debug-x64-try,vm-kernel-nnbd-win-release-ia32-try,vm-kernel-nnbd-win-release-x64-try,vm-kernel-optcounter-threshold-linux-release-x64-try,vm-kernel-precomp-android-release-arm_x64-try,vm-kernel-precomp-linux-debug-x64-try,vm-kernel-precomp-linux-debug-x64c-try,vm-kernel-precomp-win-debug-x64c-try,vm-kernel-reload-rollback-linux-debug-x64-try,vm-kernel-reload-linux-debug-x64-try,vm-kernel-win-debug-ia32-try,vm-kernel-win-debug-x64-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/232087
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Daco Harkes <dacoharkes@google.com>
2022-02-10 21:59:41 +00:00
|
|
|
void HandlifyWeakReferences() {
|
|
|
|
Handlify(&fast_object_copy_.fast_forward_map_.raw_weak_references_,
|
|
|
|
&slow_object_copy_.slow_forward_map_.weak_references_);
|
|
|
|
}
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
void HandlifyExternalTypedData() {
|
2021-08-09 23:56:19 +00:00
|
|
|
Handlify(&fast_object_copy_.fast_forward_map_.raw_external_typed_data_to_,
|
|
|
|
&slow_object_copy_.slow_forward_map_.external_typed_data_);
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
|
|
|
void HandlifyObjectsToReHash() {
|
2021-08-09 23:56:19 +00:00
|
|
|
Handlify(&fast_object_copy_.fast_forward_map_.raw_objects_to_rehash_,
|
|
|
|
&slow_object_copy_.slow_forward_map_.objects_to_rehash_);
|
|
|
|
}
|
|
|
|
void HandlifyExpandosToReHash() {
|
|
|
|
Handlify(&fast_object_copy_.fast_forward_map_.raw_expandos_to_rehash_,
|
|
|
|
&slow_object_copy_.slow_forward_map_.expandos_to_rehash_);
|
|
|
|
}
|
|
|
|
template <typename RawType, typename HandleType>
|
|
|
|
void Handlify(GrowableArray<RawType>* from,
|
|
|
|
GrowableArray<const HandleType*>* to) {
|
|
|
|
const auto length = from->length();
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
if (length > 0) {
|
2021-08-09 23:56:19 +00:00
|
|
|
to->Resize(length);
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
for (intptr_t i = 0; i < length; i++) {
|
2021-08-09 23:56:19 +00:00
|
|
|
(*to)[i] = &HandleType::Handle(Z, (*from)[i]);
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
2021-08-09 23:56:19 +00:00
|
|
|
from->Clear();
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
void HandlifyFromToObjects() {
|
|
|
|
auto& fast_forward_map = fast_object_copy_.fast_forward_map_;
|
|
|
|
auto& slow_forward_map = slow_object_copy_.slow_forward_map_;
|
|
|
|
const intptr_t length = fast_forward_map.raw_from_to_.length();
|
2022-06-01 17:52:28 +00:00
|
|
|
slow_forward_map.from_to_transition_.Resize(length);
|
|
|
|
for (intptr_t i = 0; i < length; i++) {
|
|
|
|
slow_forward_map.from_to_transition_[i] =
|
|
|
|
&PassiveObject::Handle(Z, fast_forward_map.raw_from_to_[i]);
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
2022-06-01 17:52:28 +00:00
|
|
|
ASSERT(slow_forward_map.from_to_transition_.length() == length);
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
fast_forward_map.raw_from_to_.Clear();
|
|
|
|
}
|
2022-06-01 17:52:28 +00:00
|
|
|
void ObjectifyFromToObjects() {
|
|
|
|
auto& from_to_transition =
|
|
|
|
slow_object_copy_.slow_forward_map_.from_to_transition_;
|
|
|
|
auto& from_to = slow_object_copy_.slow_forward_map_.from_to_;
|
|
|
|
intptr_t length = from_to_transition.length();
|
|
|
|
from_to = GrowableObjectArray::New(length, Heap::kOld);
|
|
|
|
for (intptr_t i = 0; i < length; i++) {
|
|
|
|
from_to.Add(*from_to_transition[i]);
|
|
|
|
}
|
|
|
|
ASSERT(from_to.Length() == length);
|
|
|
|
from_to_transition.Clear();
|
|
|
|
}
|
|
|
|
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
void ThrowException(const char* exception_msg) {
|
|
|
|
const auto& msg_obj = String::Handle(Z, String::New(exception_msg));
|
|
|
|
const auto& args = Array::Handle(Z, Array::New(1));
|
|
|
|
args.SetAt(0, msg_obj);
|
|
|
|
Exceptions::ThrowByType(Exceptions::kArgument, args);
|
|
|
|
UNREACHABLE();
|
|
|
|
}
|
|
|
|
|
|
|
|
Thread* thread_;
|
|
|
|
Zone* zone_;
|
|
|
|
FastObjectCopy fast_object_copy_;
|
|
|
|
SlowObjectCopy slow_object_copy_;
|
2022-05-20 08:16:41 +00:00
|
|
|
intptr_t copied_objects_ = 0;
|
|
|
|
intptr_t allocated_bytes_ = 0;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
ObjectPtr CopyMutableObjectGraph(const Object& object) {
|
|
|
|
auto thread = Thread::Current();
|
2022-05-20 08:16:41 +00:00
|
|
|
TIMELINE_DURATION(thread, Isolate, "CopyMutableObjectGraph");
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
ObjectGraphCopier copier(thread);
|
2022-05-20 08:16:41 +00:00
|
|
|
ObjectPtr result = copier.CopyObjectGraph(object);
|
|
|
|
#if defined(SUPPORT_TIMELINE)
|
|
|
|
if (tbes.enabled()) {
|
|
|
|
tbes.SetNumArguments(2);
|
|
|
|
tbes.FormatArgument(0, "CopiedObjects", "%" Pd, copier.copied_objects());
|
|
|
|
tbes.FormatArgument(1, "AllocatedBytes", "%" Pd, copier.allocated_bytes());
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
return result;
|
[vm/concurrency] Implement a fast transitive object copy for isolate message passing
We use message passing as comunication mechanism between isolates.
The transitive closure of an object to be sent is currently serialized
into a snapshot form and deserialized on the receiver side. Furthermore
the receiver side will re-hash any linked hashmaps in that graph.
If isolate gropus are enabled we have all isolates in a group work on
the same heap. That removes the need to use an intermediate
serialization format. It also removes the need for an O(n) step on the
receiver side.
This CL implements a fast transitive object copy implementation and
makes use of it a message that is to be passed to another isolate stays
within the same isolate group.
In the common case the object graph will fit into new space. So the
copy algorithm will try to take advantage of it by having a fast path
and a fallback path. Both of them effectively copy the graph in BFS
order.
The algorithm works effectively like a scavenge operation, but instead
of first copying the from-object to the to-space and then re-writing the
object in to-space to forward the pointers (which requires us writing to
the to-space memory twice), we only reserve space for to-objects and
then initialize the to-objects to it's final contents, including
forwarded pointers (i.e. write the to-space object only once).
Compared with a scavenge operation (which stores forwarding pointers in
the objects themselves), we use a [WeakTable] to store them. This is the
only remaining expensive part of the algorithm and could be further
optimized. To avoid relying on iterating the to-space, we'll remember
[from, to] addresses.
=> All of this works inside a [NoSafepointOperationScope] and avoids
usages of handles as well as write barriers.
While doing the transitive object copy, we'll share any object we can
safely share (canonical objects, strings, sendports, ...) instead of
copying it.
If the fast path fails (due to allocation failure or hitting) we'll
handlify any raw pointers and continue almost the same algorithm in a
safe way, where GC is possible at every object allocation site and
normal barriers are used for any stores of object pointers.
The copy algorithm uses templates to share the copy logic between the
fast and slow case (same copy routines can work on raw pointers as well
as handles).
There's a few special things to take into consideration:
* If we copy a view on external typed data we need to know the
external typed data address to compute the inner pointer of the
view, so we'll eagerly initialize external typed data.
* All external typed data needs to get a finalizer attached
(irrespective if the object copy suceeds or not) to ensure the
`malloc()`ed data is freed again.
* Transferables will only be transferred on successful transitive
copies. Also they need to attach finalizers to objects (which
requires all objects be in handles).
* We copy linked hashmaps as they are - instead of compressing the
data by removing deleted entries. We may need to re-hash those
hashmaps on the receiver side (similar to the snapshot-based copy
approach) since new object graph will have no identity hash codes
assigned to them. Though if the hashmaps only has sharable objects
as keys (very common, e.g. json) there is no need for re-hashing.
It changes the SendPort.* benchmarks as follows:
```
Benchmark | default | IG | IG + FOC
----------------------------------------------------------------------------------------------------------------------------
SendPort.Send.Nop(RunTimeRaw): | 0.25 us (1 x) | 0.26 us (0.96 x) | 0.25 us (1.00 x)
SendPort.Send.Json.400B(RunTimeRaw): | 4.15 us (1 x) | 1.45 us (2.86 x) | 1.05 us (3.95 x)
SendPort.Send.Json.5KB(RunTimeRaw): | 82.16 us (1 x) | 27.17 us (3.02 x) | 18.32 us (4.48 x)
SendPort.Send.Json.50KB(RunTimeRaw): | 784.70 us (1 x) | 242.10 us (3.24 x) | 165.50 us (4.74 x)
SendPort.Send.Json.500KB(RunTimeRaw): | 8510.4 us (1 x) | 3083.80 us (2.76 x) | 2311.29 us (3.68 x)
SendPort.Send.Json.5MB(RunTimeRaw): | 122381.33 us (1 x) | 62959.40 us (1.94 x) | 55492.10 us (2.21 x)
SendPort.Send.BinaryTree.2(RunTimeRaw): | 1.91 us (1 x) | 0.92 us (2.08 x) | 0.72 us (2.65 x)
SendPort.Send.BinaryTree.4(RunTimeRaw): | 6.32 us (1 x) | 2.70 us (2.34 x) | 2.10 us (3.01 x)
SendPort.Send.BinaryTree.6(RunTimeRaw): | 25.24 us (1 x) | 10.47 us (2.41 x) | 8.61 us (2.93 x)
SendPort.Send.BinaryTree.8(RunTimeRaw): | 104.08 us (1 x) | 41.08 us (2.53 x) | 33.51 us (3.11 x)
SendPort.Send.BinaryTree.10(RunTimeRaw): | 373.39 us (1 x) | 174.11 us (2.14 x) | 134.75 us (2.77 x)
SendPort.Send.BinaryTree.12(RunTimeRaw): | 1588.64 us (1 x) | 893.18 us (1.78 x) | 532.05 us (2.99 x)
SendPort.Send.BinaryTree.14(RunTimeRaw): | 6849.55 us (1 x) | 3705.19 us (1.85 x) | 2507.90 us (2.73 x)
SendPort.Receive.Nop(RunTimeRaw): | 0.67 us (1 x) | 0.69 us (0.97 x) | 0.68 us (0.99 x)
SendPort.Receive.Json.400B(RunTimeRaw): | 4.37 us (1 x) | 0.78 us (5.60 x) | 0.77 us (5.68 x)
SendPort.Receive.Json.5KB(RunTimeRaw): | 45.67 us (1 x) | 0.90 us (50.74 x) | 0.87 us (52.49 x)
SendPort.Receive.Json.50KB(RunTimeRaw): | 498.81 us (1 x) | 1.24 us (402.27 x) | 1.06 us (470.58 x)
SendPort.Receive.Json.500KB(RunTimeRaw): | 5366.02 us (1 x) | 4.22 us (1271.57 x) | 4.65 us (1153.98 x)
SendPort.Receive.Json.5MB(RunTimeRaw): | 101050.88 us (1 x) | 20.81 us (4855.88 x) | 21.0 us (4811.95 x)
SendPort.Receive.BinaryTree.2(RunTimeRaw): | 3.91 us (1 x) | 0.76 us (5.14 x) | 0.74 us (5.28 x)
SendPort.Receive.BinaryTree.4(RunTimeRaw): | 9.90 us (1 x) | 0.79 us (12.53 x) | 0.76 us (13.03 x)
SendPort.Receive.BinaryTree.6(RunTimeRaw): | 33.09 us (1 x) | 0.87 us (38.03 x) | 0.84 us (39.39 x)
SendPort.Receive.BinaryTree.8(RunTimeRaw): | 126.77 us (1 x) | 0.92 us (137.79 x) | 0.88 us (144.06 x)
SendPort.Receive.BinaryTree.10(RunTimeRaw): | 533.09 us (1 x) | 0.94 us (567.12 x) | 0.92 us (579.45 x)
SendPort.Receive.BinaryTree.12(RunTimeRaw): | 2223.23 us (1 x) | 3.03 us (733.74 x) | 3.04 us (731.33 x)
SendPort.Receive.BinaryTree.14(RunTimeRaw): | 8945.66 us (1 x) | 4.03 us (2219.77 x) | 4.30 us (2080.39 x)
```
Issue https://github.com/dart-lang/sdk/issues/36097
TEST=vm/dart{,_2}/isolates/fast_object_copy{,2}_test
Change-Id: I835c59dab573d365b8a4b9d7c5359a6ea8d8b0a7
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/203776
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Ryan Macnak <rmacnak@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
Reviewed-by: Alexander Aprelev <aam@google.com>
2021-07-13 19:04:20 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
} // namespace dart
|