This work pulls in v8 support for these features with
appropriate changes for Dart and closes
https://github.com/dart-lang/sdk/issues/34935.
This adds support for the following features:
* Interpreting patterns as Unicode patterns instead of
BMP patterns
* the dotAll flag (`/s`) for changing the behavior
of '.' to also match line terminators
* Escapes for character classes described by Unicode
property groups (e.g., \p{Greek} to match all Greek
characters, or \P{Greek} for all non-Greek characters).
The following TC39 proposals describe some of the added features:
* https://github.com/tc39/proposal-regexp-dotall-flag
* https://github.com/tc39/proposal-regexp-unicode-property-escapes
These additional changes are included:
* Extends named capture group names to include the full
range of identifier characters supported by ECMAScript,
not just ASCII.
* Changing the RegExp interface to return RegExpMatch
objects, not Match objects, so that downcasting is
not necessary to use named capture groups from Dart
**Note**: The changes to the RegExp interface are a
breaking change for implementers of the RegExp interface.
Current users of the RegExp interface (i.e., code using Dart
RegExp objects) will not be affected.
Change-Id: Ie62e6082a0e2fedc1680ef2576ce0c6db80fc19a
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/100641
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Stevie Strickland <sstrickl@google.com>
This reverts commit 5ebb640a67.
Reason for revert: <INSERT REASONING HERE>
Original change's description:
> [vm] Finish adding support for ECMAScript 2018 features.
>
> This work pulls in v8 support for these features with
> appropriate changes for Dart and closes
> https://github.com/dart-lang/sdk/issues/34935.
>
> This adds support for the following features:
>
> * Interpreting patterns as Unicode patterns instead of
> BMP patterns
> * the dotAll flag (`/s`) for changing the behavior
> of '.' to also match line terminators
> * Escapes for character classes described by Unicode
> property groups (e.g., \p{Greek} to match all Greek
> characters, or \P{Greek} for all non-Greek characters).
>
> The following TC39 proposals describe some of the added features:
>
> * https://github.com/tc39/proposal-regexp-dotall-flag
> * https://github.com/tc39/proposal-regexp-unicode-property-escapes
>
> These additional changes are included:
>
> * Extends named capture group names to include the full
> range of identifier characters supported by ECMAScript,
> not just ASCII.
> * Changing the RegExp interface to return RegExpMatch
> objects, not Match objects, so that downcasting is
> not necessary to use named capture groups from Dart
>
> **Note**: The changes to the RegExp interface are a
> breaking change for implementers of the RegExp interface.
> Current users of the RegExp interface (i.e., code using Dart
> RegExp objects) will not be affected.
>
> Change-Id: I0709ed0a8d5db36680e32bbad585594857b9ace4
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/95651
> Commit-Queue: Stevie Strickland <sstrickl@google.com>
> Reviewed-by: Johnni Winther <johnniwinther@google.com>
> Reviewed-by: Lasse R.H. Nielsen <lrn@google.com>
> Reviewed-by: Martin Kustermann <kustermann@google.com>
TBR=lrn@google.com,kustermann@google.com,jmesserly@google.com,johnniwinther@google.com,sstrickl@google.com
# Not skipping CQ checks because original CL landed > 1 day ago.
Change-Id: I1eda0fee4fd9e94df095944049833a67b07277e2
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/100560
Reviewed-by: Keerti Parthasarathy <keertip@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Keerti Parthasarathy <keertip@google.com>
This work pulls in v8 support for these features with
appropriate changes for Dart and closes
https://github.com/dart-lang/sdk/issues/34935.
This adds support for the following features:
* Interpreting patterns as Unicode patterns instead of
BMP patterns
* the dotAll flag (`/s`) for changing the behavior
of '.' to also match line terminators
* Escapes for character classes described by Unicode
property groups (e.g., \p{Greek} to match all Greek
characters, or \P{Greek} for all non-Greek characters).
The following TC39 proposals describe some of the added features:
* https://github.com/tc39/proposal-regexp-dotall-flag
* https://github.com/tc39/proposal-regexp-unicode-property-escapes
These additional changes are included:
* Extends named capture group names to include the full
range of identifier characters supported by ECMAScript,
not just ASCII.
* Changing the RegExp interface to return RegExpMatch
objects, not Match objects, so that downcasting is
not necessary to use named capture groups from Dart
**Note**: The changes to the RegExp interface are a
breaking change for implementers of the RegExp interface.
Current users of the RegExp interface (i.e., code using Dart
RegExp objects) will not be affected.
Change-Id: I0709ed0a8d5db36680e32bbad585594857b9ace4
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/95651
Commit-Queue: Stevie Strickland <sstrickl@google.com>
Reviewed-by: Johnni Winther <johnniwinther@google.com>
Reviewed-by: Lasse R.H. Nielsen <lrn@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
See https://github.com/tc39/proposal-regexp-named-groups
for a high-level description of the feature and examples. This is one of the
features requested in https://github.com/dart-lang/sdk/issues/34935.
This is a partial implementation because while there is a way to retrieve
groups via Dart by name, it requires casting the returned Match to the
new RegExpMatch interface to avoid changing the RegExp interface.
Changing the RegExp interface will happen in a future update, since there
are other planned changes to the RegExp interface coming soon and that way
we only change it once. See https://github.com/dart-lang/sdk/issues/36171
for more details on the planned changes.
Also, since only BMP regular expressions are supported, not full
Unicode ones (i.e., those with the /u flag in ECMAscript), \k<NAME>
will only be parsed as a named back reference if there are named
captures in the string. Otherwise, the \k will be parsed as the identity
escape for backwards compatibility. The new tests illustrate this
difference.
Change-Id: Ieeb0374813db78924c9aa8ac3e652dfb6d4a5934
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/95461
Commit-Queue: Stevie Strickland <sstrickl@google.com>
Reviewed-by: Lasse R.H. Nielsen <lrn@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Reviewed-by: Jenny Messerly <jmesserly@google.com>
Reviewed-by: Johnni Winther <johnniwinther@google.com>
See https://github.com/tc39/proposal-regexp-lookbehind
for a high-level description of the feature and examples. This is one of the
features requested in https://github.com/dart-lang/sdk/issues/34935.
This work takes the feature as present in the v8 engine and appropriately
merges it into our irregexp fork. Notable changes to the irregexp codebase to
introduce this feature:
-----
We can no longer assume that all matching proceeds forwards, since lookbehind
matching proceeds backwards. Similarly, we cannot assume that we can only be
at the start of a string if we started matching from that point. The direction
of matching must also be taken into consideration when doing bounds checking,
which previously assumed the engine would never attempt to look before the
start of a string.
-----
We may now parse backreferences to captures before the capture they
reference, since we parse regular expressions left to right, but lookbehinds
perform captures as they evaluate the string from right to left. Since
RegExpBackReference objects contain a pointer to their corresponding capture,
this means that we may need to create RegExpCapture objects prior to the
parsing of the corresponding captured subexpression.
Thus, RegExpCapture objects are now only initialized with their index, and the
body is set later when the subexpression is encountered and parsed. This means
any method that operates on the body of a RegExpCapture can no longer be const,
which also affects the rest of the RegExpTree class hierarchy. This also means
that we don't have a valid max_match length for backreferences based off the
capture body, and must assume they can end up being any length.
-----
Change-Id: Iffe0e71b17b1a0c6fea77235e8aee5c093005811
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/94540
Commit-Queue: Stevie Strickland <sstrickl@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
New folder structure (nested under vm/):
- compiler/
- jit/ - JIT specific code
- aot/ - AOT specific code
- backend/ - all middle-end and back-end code (IL, flow graph)
- assembler/ - assemblers and disassemblers
- frontend/ - front ends (AST -> IL, Kernel -> IL)
compiler/README.md would be the documentation root for the compiler
pipeline
Bug: https://github.com/dart-lang/sdk/issues/30575
Change-Id: I2dfd9688793bff737f7632ddc77fca766875ce36
Reviewed-on: https://dart-review.googlesource.com/2940
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
Commit-Queue: Vyacheslav Egorov <vegorov@google.com>
Previously these functions would only contain a single CheckStackOverflowInstr
in a backtracking block and that CheckStackOverflowInstr would have a zero
loop_depth - which means it would not be considered eligable for OSR.
This change:
* adds CheckStackOverflowInstr with non-zero loop_depth in two other places
(Boyer-Moore lookahead skip loop and greedy loop) where loops arise in the
generated IL;
* sets non-zero loop depth on the CheckStackOverflowInstr in the backtracking
block;
* adds a flag on CheckStackOverflowInstr that allows optimizing compiler to
optimize away those checks that were inserted solely to serve as OSR entries.
* ensures that IR generated by IRRegExpMacroAssembler is OSR compatible:
* GraphEntryInstr has correct osr_id;
* GraphEntry and normal entry have different block ids (B0 and B1 - instead of B0 and B0);
* unreachable blocks are pruned and GraphEntry is rewired to point to OSR entry;
* IRRegExpMacroAssembler::GrowStack should not assume that stack_array_cell and :stack
are always in sync, because :stack can come from OSR or deoptimization why stack_array_cell
is a constant associated with a particular Code object.
* refactors the way the RegExp stack was growing: instead of having a special instruction
just emit a call to a Dart function;
* refactors the way block pruning for OSR is done by consolidating duplicated code
in a single function.
We allow the optimizing compiler to remove preemption checks from
non-backtracking loops in the regexp code because those loops
unlike backtracking have guaranteed O(input_length) time
complexity.
Performance Implications
------------------------
This change improves performance of regexps in cases where regexp spends a lot
of time in the first invocation (either due to backtracking or due to long non
matching prefix) by allowing VM to optimize the :matcher while :matcher is
running.
For example on regex-redux[1] benchmark it improves Dart performance by 3x
(from ~18s to ~6s on my Mac Book Pro).
CL history
----------
This relands commit d87cc52c3e.
Original code review: https://codereview.chromium.org/2950783003/
[1] https://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexredux&lang=dart&id=2R=erikcorry@google.com
Review-Url: https://codereview.chromium.org/2951053003 .
Previously these functions would only contain a single CheckStackOverflowInstr
in a backtracking block and that CheckStackOverflowInstr would have a zero
loop_depth - which means it would not be considered eligable for OSR.
This change:
* adds CheckStackOverflowInstr with non-zero loop_depth in two other places
(Boyer-Moore lookahead skip loop and greedy loop) where loops arise in the
generated IL;
* sets non-zero loop depth on the CheckStackOverflowInstr in the backtracking
block;
* adds a flag on CheckStackOverflowInstr that allows optimizing compiler to
optimize away those checks that were inserted solely to serve as OSR entries.
We allow the optimizing compiler to remove preemption checks from
non-backtracking loops in the regexp code because those loops
unlike backtracking have guaranteed O(input_length) time
complexity.
Performance Implications
------------------------
This change improves performance of regexps in cases where regexp spends a lot
of time in the first invocation (either due to backtracking or due to long non
matching prefix) by allowing VM to optimize the :matcher while :matcher is
running.
For example on regex-redux[1] benchmark it improves Dart performance by 3x
(from ~18s to ~6s on my Mac Book Pro).
[1] https://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexredux&lang=dart&id=2
BUG=
R=erikcorry@google.com
Review-Url: https://codereview.chromium.org/2950783003 .
i.e. #ifndef VM_WHATEVER -> #ifndef RUNTIME_VM_WHATEVER
This lets us remove a hack from the PRESUBMIT.py script that existed
for reasons that are no longer valid, and sets us up to add some
presubmit checks for the GN build.
R=asiva@google.com, rmacnak@google.com
Review URL: https://codereview.chromium.org/2450713004 .
Refactor all remaning cases where the current zone is used through new(Isolate*) and remove this interface.
Removing this interface is needed to move towards multiple threads per isolate, and also makes the caller more aware of the scope of the zone used, reducing the risk of use-after-free.
Make the current thread and the stack zone created around native/runtime entries directly available in their body, saving an indirection (and optimized away if unused).
R=iposva@google.com
Review URL: https://codereview.chromium.org//982873004
git-svn-id: https://dart.googlecode.com/svn/branches/bleeding_edge/dart@44541 260f80e4-7a28-3924-810f-c04153c831b5