Previously these functions would only contain a single CheckStackOverflowInstr
in a backtracking block and that CheckStackOverflowInstr would have a zero
loop_depth - which means it would not be considered eligable for OSR.
This change:
* adds CheckStackOverflowInstr with non-zero loop_depth in two other places
(Boyer-Moore lookahead skip loop and greedy loop) where loops arise in the
generated IL;
* sets non-zero loop depth on the CheckStackOverflowInstr in the backtracking
block;
* adds a flag on CheckStackOverflowInstr that allows optimizing compiler to
optimize away those checks that were inserted solely to serve as OSR entries.
* ensures that IR generated by IRRegExpMacroAssembler is OSR compatible:
* GraphEntryInstr has correct osr_id;
* GraphEntry and normal entry have different block ids (B0 and B1 - instead of B0 and B0);
* unreachable blocks are pruned and GraphEntry is rewired to point to OSR entry;
* IRRegExpMacroAssembler::GrowStack should not assume that stack_array_cell and :stack
are always in sync, because :stack can come from OSR or deoptimization why stack_array_cell
is a constant associated with a particular Code object.
* refactors the way the RegExp stack was growing: instead of having a special instruction
just emit a call to a Dart function;
* refactors the way block pruning for OSR is done by consolidating duplicated code
in a single function.
We allow the optimizing compiler to remove preemption checks from
non-backtracking loops in the regexp code because those loops
unlike backtracking have guaranteed O(input_length) time
complexity.
Performance Implications
------------------------
This change improves performance of regexps in cases where regexp spends a lot
of time in the first invocation (either due to backtracking or due to long non
matching prefix) by allowing VM to optimize the :matcher while :matcher is
running.
For example on regex-redux[1] benchmark it improves Dart performance by 3x
(from ~18s to ~6s on my Mac Book Pro).
CL history
----------
This relands commit d87cc52c3e.
Original code review: https://codereview.chromium.org/2950783003/
[1] https://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexredux&lang=dart&id=2R=erikcorry@google.com
Review-Url: https://codereview.chromium.org/2951053003 .
Inline instance object hash code into object header on 64 bit.
64 bit objects have 32 bits of free space in the header word.
This is used for the hash code in string objects. We take it
for the default hash code on all objects that don't override
the hashCode getter.
This is both faster and a memory reduction. Eg it makes the
MegaHashCode part of the Megamorphic benchmark 6 times faster.
This is a reland of https://codereview.chromium.org/2912863006/
It fixes issues with the 32 bit compare-swap instruction on
ARM64 and fixes a fragile tree shaking test that is sensitive
to which private methods are in the core libraries.
R=kustermann@google.com, vegorov@google.com
BUG=
Review-Url: https://codereview.chromium.org/2954453002 .
I collected statistics for the sizes and capacities of growable arrays which are promoted to old-space or survive an old-space gc when running dart2js and Fasta. For these applications, the vast majority of arrays stay empty. More than half of the total object size of promoted backing arrays is backing for empty growable arrays.
Furthermore, since the overhead for an array is 3 words (header, type parameters and length), and object sizes are rounded up to an even number of words, we waste one word for all even-sized arrays.
This CL changes the growth strategy so that empty growable arrays are created with a shared, zero-sized array as backing, avoiding the allocation of a backing array if no elements are added. When the array needs to grow, it starts out at 3 and grows to double size plus one each time: 7, 15, 31, ...
A few places in the VM code need to handle these shared, zero-sized arrays specially. In particular, the Array::MakeArray function needs to allocate a new, empty array if its result is to be returned to Dart code.
Benchmarks suggest that the change improves memory usage by a few percent overall and does not significantly affect run time.
BUG=
R=erikcorry@google.com
Review-Url: https://codereview.chromium.org/2949803002 .
64 bit objects have 32 bits of free space in the header word.
This is used for the hash code in string objects. We take it
for the default hash code on all objects that don't override
the hashCode getter.
This is both faster and a memory reduction. Eg it shaves about
70% off the running time of this microbenchmark:
List list = [];
class Thing {
get hashCode => 42;
}
class Thing2 {
get hashCode => 42;
}
class Thing3 { }
class Thing4 { }
main() {
int sum = 103;
for (int i = 0; i < 10000000; i++) {
list = [];
list.add("foo");
list.add(123);
list.add(1.23);
list.add(new Object());
list.add(new Thing());
list.add(new Thing2());
list.add(new Thing3());
list.add(new Thing4());
for (int j = 0; j < 2; j++) {
sum ^= biz(list);
}
}
print(sum);
}
int biz(List list) {
int sum = 103;
for (var x in list) {
sum ^= x.hashCode;
}
return sum;
}
R=rmacnak@google.com, vegorov@google.com
BUG=
Review-Url: https://codereview.chromium.org/2912863006 .
Do this in unoptimized code only, when --reify-generic-functions is specified.
This is still work in progress, and support in optimizer, in inliner, in DBC,
in kernel to ir, and other areas, will follow.
Many small fixes and added todos.
R=rmacnak@google.com, vegorov@google.com
Review-Url: https://codereview.chromium.org/2941643002 .
Mostly stream kernel_reader, i.e. the code that sets up the libraries,
classes, methods etc.
Mostly because it still takes a "Program" ast node, and looks at the
"Library" ast nodes to get their kernel offset in the binary.
Currently the scripts (containing breakable points etc) are also created
from the ast nodes.
The rest is now streamed.
This also means that more ast visitors could be deleted.
R=kmillikin@google.com
Review-Url: https://codereview.chromium.org/2931813002 .
Previously these functions would only contain a single CheckStackOverflowInstr
in a backtracking block and that CheckStackOverflowInstr would have a zero
loop_depth - which means it would not be considered eligable for OSR.
This change:
* adds CheckStackOverflowInstr with non-zero loop_depth in two other places
(Boyer-Moore lookahead skip loop and greedy loop) where loops arise in the
generated IL;
* sets non-zero loop depth on the CheckStackOverflowInstr in the backtracking
block;
* adds a flag on CheckStackOverflowInstr that allows optimizing compiler to
optimize away those checks that were inserted solely to serve as OSR entries.
We allow the optimizing compiler to remove preemption checks from
non-backtracking loops in the regexp code because those loops
unlike backtracking have guaranteed O(input_length) time
complexity.
Performance Implications
------------------------
This change improves performance of regexps in cases where regexp spends a lot
of time in the first invocation (either due to backtracking or due to long non
matching prefix) by allowing VM to optimize the :matcher while :matcher is
running.
For example on regex-redux[1] benchmark it improves Dart performance by 3x
(from ~18s to ~6s on my Mac Book Pro).
[1] https://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexredux&lang=dart&id=2
BUG=
R=erikcorry@google.com
Review-Url: https://codereview.chromium.org/2950783003 .
Catch simply gave NullConstant as arguments to _instanceOf,
now I've copied what the IsExpression does, and made
LoadInstantiatorTypeArguments actually work in this case
(by filling out the scopes_->this_variable value,
by actually visiting the catch guard in the scope builder
rather than skipping it).
Fixes#29553.
BUG=
R=kmillikin@google.com
Review-Url: https://codereview.chromium.org/2938173002 .
Before this, if a generic class was instantiated with only dynamic,
TypeArgument::null would be used as the type argument, ignoring
types from the super.
Now it only returns TypeArguments::null if the class directly gives all
type arguments (and they are all dynamic).
Fixes#29537
BUG=
R=kmillikin@google.com
Review-Url: https://codereview.chromium.org/2941983002 .
- Add TokenPosition to AllocateObject
meaning that kernel more often has a TokenPosition available.
This might influence profiling, especially it influences some
of the vm/cc/Profiler* tests.
- Update profiler_service to also be able to find the current token
via kernel (as opposed to either returning NULL or crashing).
This makes use of the source code included in the kernel file.
BUG=
R=kmillikin@google.com, vegorov@google.com
Review-Url: https://codereview.chromium.org/2944433003 .
On a switch fall through error, Fasta currently generates
```
throw new core::FallThroughError::•();
```
which generates the error-message via the VM:
```
'null': Switch case fall-through at line null.
```
This introduces a new constructor taking a url and a linenumber,
which then can give a better error message.
BUG=
R=ahe@google.com
Review-Url: https://codereview.chromium.org/2951453002 .
User data in may be reachable only through a closure's environment, represented as a Context object. Note we still don't consider Functions to be user objects here, and so avoid blaming the size of compiled code against any user object.
R=cbernaschina@google.com
Review-Url: https://codereview.chromium.org/2947673002 .
The code in the patch is now inlined into the vmservice library.
This is being done because, the vmservice related libraries are
now compiled directly from source instead of from the "patched_sdk".
So, what is being compiled now does not have the vmservice_patch
applied. By removing the patch, we are removing the need to
artificially patch the vmservice library and making the
vmservice_io.dill complete.
R=rmacnak@google.com
Review-Url: https://codereview.chromium.org/2946773002 .
I was trying to debug an issue noticed that printing of LetNode is
kind of useless. It didn't print the variables though they had
references, which seems confusing, and it writes all the initializers
and body nodes at the same nesting level which makes it impossible to
see where one ends and the other begins.
BUG=
R=vegorov@google.com
Review-Url: https://codereview.chromium.org/2946903002 .
Refactorings.
Mostly about only reading FunctionNode one place by introducing a
helper class that will read and skip what it is told.
For 'nested' things inside the function node (e.g. the body),
the caller for the helper still needs to handle it if it shouldn't
just be skipped.
'Non-nested' things (e.g. integers) are saved and can be fetched
by the caller.
R=kmillikin@google.com
Review-Url: https://codereview.chromium.org/2921613003 .
This helper function was being called before its argument was
initialized so it was passing null. Instead, it should be called
after its argument is initialized.
Because the initialization happens in Kernel code, it is simplest to
insert the call explicitly in Kernel code as well as part of the async
transformation. This has the consequence that we now call the helper
function even when the flag causal_async_stacks is false.
Fixes#29771.
BUG=
R=aam@google.com, asiva@google.com
Review-Url: https://codereview.chromium.org/2936793003 .