Commit graph

23 commits

Author SHA1 Message Date
Stevie Strickland 4028fec3b5 Reland "[vm] Finish adding support for ECMAScript 2018 features."
This work pulls in v8 support for these features with
appropriate changes for Dart and closes
https://github.com/dart-lang/sdk/issues/34935.

This adds support for the following features:

* Interpreting patterns as Unicode patterns instead of
  BMP patterns
* the dotAll flag (`/s`) for changing the behavior
  of '.' to also match line terminators
* Escapes for character classes described by Unicode
  property groups (e.g., \p{Greek} to match all Greek
  characters, or \P{Greek} for all non-Greek characters).

The following TC39 proposals describe some of the added features:

* https://github.com/tc39/proposal-regexp-dotall-flag
* https://github.com/tc39/proposal-regexp-unicode-property-escapes

These additional changes are included:

* Extends named capture group names to include the full
  range of identifier characters supported by ECMAScript,
  not just ASCII.
* Changing the RegExp interface to return RegExpMatch
  objects, not Match objects, so that downcasting is
  not necessary to use named capture groups from Dart

**Note**: The changes to the RegExp interface are a
breaking change for implementers of the RegExp interface.
Current users of the RegExp interface (i.e., code using Dart
RegExp objects) will not be affected.

Change-Id: Ie62e6082a0e2fedc1680ef2576ce0c6db80fc19a
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/100641
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Stevie Strickland <sstrickl@google.com>
2019-04-29 09:11:48 +00:00
Keerti Parthasarathy 9238e25305 Revert "[vm] Finish adding support for ECMAScript 2018 features."
This reverts commit 5ebb640a67.

Reason for revert: <INSERT REASONING HERE>

Original change's description:
> [vm] Finish adding support for ECMAScript 2018 features.
> 
> This work pulls in v8 support for these features with
> appropriate changes for Dart and closes
> https://github.com/dart-lang/sdk/issues/34935.
> 
> This adds support for the following features:
> 
> * Interpreting patterns as Unicode patterns instead of
>   BMP patterns
> * the dotAll flag (`/s`) for changing the behavior
>   of '.' to also match line terminators
> * Escapes for character classes described by Unicode
>   property groups (e.g., \p{Greek} to match all Greek
>   characters, or \P{Greek} for all non-Greek characters).
> 
> The following TC39 proposals describe some of the added features:
> 
> * https://github.com/tc39/proposal-regexp-dotall-flag
> * https://github.com/tc39/proposal-regexp-unicode-property-escapes
> 
> These additional changes are included:
> 
> * Extends named capture group names to include the full
>   range of identifier characters supported by ECMAScript,
>   not just ASCII.
> * Changing the RegExp interface to return RegExpMatch
>   objects, not Match objects, so that downcasting is
>   not necessary to use named capture groups from Dart
> 
> **Note**: The changes to the RegExp interface are a
> breaking change for implementers of the RegExp interface.
> Current users of the RegExp interface (i.e., code using Dart
> RegExp objects) will not be affected.
> 
> Change-Id: I0709ed0a8d5db36680e32bbad585594857b9ace4
> Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/95651
> Commit-Queue: Stevie Strickland <sstrickl@google.com>
> Reviewed-by: Johnni Winther <johnniwinther@google.com>
> Reviewed-by: Lasse R.H. Nielsen <lrn@google.com>
> Reviewed-by: Martin Kustermann <kustermann@google.com>

TBR=lrn@google.com,kustermann@google.com,jmesserly@google.com,johnniwinther@google.com,sstrickl@google.com

# Not skipping CQ checks because original CL landed > 1 day ago.

Change-Id: I1eda0fee4fd9e94df095944049833a67b07277e2
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/100560
Reviewed-by: Keerti Parthasarathy <keertip@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Commit-Queue: Keerti Parthasarathy <keertip@google.com>
2019-04-25 14:29:51 +00:00
Stevie Strickland 5ebb640a67 [vm] Finish adding support for ECMAScript 2018 features.
This work pulls in v8 support for these features with
appropriate changes for Dart and closes
https://github.com/dart-lang/sdk/issues/34935.

This adds support for the following features:

* Interpreting patterns as Unicode patterns instead of
  BMP patterns
* the dotAll flag (`/s`) for changing the behavior
  of '.' to also match line terminators
* Escapes for character classes described by Unicode
  property groups (e.g., \p{Greek} to match all Greek
  characters, or \P{Greek} for all non-Greek characters).

The following TC39 proposals describe some of the added features:

* https://github.com/tc39/proposal-regexp-dotall-flag
* https://github.com/tc39/proposal-regexp-unicode-property-escapes

These additional changes are included:

* Extends named capture group names to include the full
  range of identifier characters supported by ECMAScript,
  not just ASCII.
* Changing the RegExp interface to return RegExpMatch
  objects, not Match objects, so that downcasting is
  not necessary to use named capture groups from Dart

**Note**: The changes to the RegExp interface are a
breaking change for implementers of the RegExp interface.
Current users of the RegExp interface (i.e., code using Dart
RegExp objects) will not be affected.

Change-Id: I0709ed0a8d5db36680e32bbad585594857b9ace4
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/95651
Commit-Queue: Stevie Strickland <sstrickl@google.com>
Reviewed-by: Johnni Winther <johnniwinther@google.com>
Reviewed-by: Lasse R.H. Nielsen <lrn@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
2019-04-24 09:24:16 +00:00
Stevie Strickland af5fc2d4d2 [VM] Partial support for named regexp captures.
See https://github.com/tc39/proposal-regexp-named-groups
for a high-level description of the feature and examples.  This is one of the
features requested in https://github.com/dart-lang/sdk/issues/34935.

This is a partial implementation because while there is a way to retrieve
groups via Dart by name, it requires casting the returned Match to the
new RegExpMatch interface to avoid changing the RegExp interface.
Changing the RegExp interface will happen in a future update, since there
are other planned changes to the RegExp interface coming soon and that way
we only change it once. See https://github.com/dart-lang/sdk/issues/36171
for more details on the planned changes.

Also, since only BMP regular expressions are supported, not full
Unicode ones (i.e., those with the /u flag in ECMAscript), \k<NAME>
will only be parsed as a named back reference if there are named
captures in the string. Otherwise, the \k will be parsed as the identity
escape for backwards compatibility. The new tests illustrate this
difference.

Change-Id: Ieeb0374813db78924c9aa8ac3e652dfb6d4a5934
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/95461
Commit-Queue: Stevie Strickland <sstrickl@google.com>
Reviewed-by: Lasse R.H. Nielsen <lrn@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
Reviewed-by: Jenny Messerly <jmesserly@google.com>
Reviewed-by: Johnni Winther <johnniwinther@google.com>
2019-03-19 10:40:15 +00:00
Stevie Strickland f31b7928ae [VM] Adding regexp lookbehind assertion support.
See https://github.com/tc39/proposal-regexp-lookbehind
for a high-level description of the feature and examples.  This is one of the
features requested in https://github.com/dart-lang/sdk/issues/34935.

This work takes the feature as present in the v8 engine and appropriately
merges it into our irregexp fork. Notable changes to the irregexp codebase to
introduce this feature:

-----

We can no longer assume that all matching proceeds forwards, since lookbehind
matching proceeds backwards. Similarly, we cannot assume that we can only be
at the start of a string if we started matching from that point. The direction
of matching must also be taken into consideration when doing bounds checking,
which previously assumed the engine would never attempt to look before the
start of a string.

-----

We may now parse backreferences to captures before the capture they
reference, since we parse regular expressions left to right, but lookbehinds
perform captures as they evaluate the string from right to left.  Since
RegExpBackReference objects contain a pointer to their corresponding capture,
this means that we may need to create RegExpCapture objects prior to the
parsing of the corresponding captured subexpression.

Thus, RegExpCapture objects are now only initialized with their index, and the
body is set later when the subexpression is encountered and parsed. This means
any method that operates on the body of a RegExpCapture can no longer be const,
which also affects the rest of the RegExpTree class hierarchy. This also means
that we don't have a valid max_match length for backreferences based off the
capture body, and must assume they can end up being any length.

-----


Change-Id: Iffe0e71b17b1a0c6fea77235e8aee5c093005811
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/94540
Commit-Queue: Stevie Strickland <sstrickl@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
2019-03-14 14:26:47 +00:00
Vyacheslav Egorov 8a179fb953 [VM, Compiler] Move compiler to a separate folder.
New folder structure (nested under vm/):

- compiler/
-   jit/         - JIT specific code
-   aot/         - AOT specific code
-   backend/     - all middle-end and back-end code (IL, flow graph)
-   assembler/   - assemblers and disassemblers
-   frontend/    - front ends (AST -> IL, Kernel -> IL)

compiler/README.md would be the documentation root for the compiler
pipeline

Bug: https://github.com/dart-lang/sdk/issues/30575
Change-Id: I2dfd9688793bff737f7632ddc77fca766875ce36
Reviewed-on: https://dart-review.googlesource.com/2940
Reviewed-by: Vyacheslav Egorov <vegorov@google.com>
Commit-Queue: Vyacheslav Egorov <vegorov@google.com>
2017-09-04 15:15:18 +00:00
Zachary Anderson 6cd8a79078 VM: Re-format to use at most one newline between functions
R=asiva@google.com

Review-Url: https://codereview.chromium.org/2974233002 .
2017-07-13 08:08:37 -07:00
Aske Simon Christensen ab7d9a5720 Omit JIT compiler from precompiled runtime on ARM, ARM64 and IA32.
Saves about a megabyte on the size of the precompiled runtime VM.

Next step will be to eliminate the assemblers and code stubs as well. This requires some more disentangling.

Until then, the X64 build still includes the compiler, since its assembler has a dependency on locations.cc, which is part of the compiler.

BUG= https://github.com/dart-lang/sdk/issues/30045
R=rmacnak@google.com, zra@google.com

Review-Url: https://codereview.chromium.org/2960413002 .
2017-07-11 12:01:49 +02:00
Vyacheslav Egorov 7887c34a29 VM(RegExp): Allow OSR optimization of RegExp :matcher functions.
Previously these functions would only contain a single CheckStackOverflowInstr
in a backtracking block and that CheckStackOverflowInstr would have a zero
loop_depth - which means it would not be considered eligable for OSR.

This change:

* adds CheckStackOverflowInstr with non-zero loop_depth in two other places
  (Boyer-Moore lookahead skip loop and greedy loop) where loops arise in the
  generated IL;
* sets non-zero loop depth on the CheckStackOverflowInstr in the backtracking
  block;
* adds a flag on CheckStackOverflowInstr that allows optimizing compiler to
  optimize away those checks that were inserted solely to serve as OSR entries.
* ensures that IR generated by IRRegExpMacroAssembler is OSR compatible:
  * GraphEntryInstr has correct osr_id;
  * GraphEntry and normal entry have different block ids (B0 and B1 - instead of B0 and B0);
  * unreachable blocks are pruned and GraphEntry is rewired to point to OSR entry;
  * IRRegExpMacroAssembler::GrowStack should not assume that  stack_array_cell and :stack
    are always in sync, because :stack can come from OSR or deoptimization why stack_array_cell
    is a constant associated with a particular Code object.
* refactors the way the RegExp stack was growing: instead of having a special instruction
  just emit a call to a Dart function;
* refactors the way block pruning for OSR is done by consolidating duplicated code
  in a single function.

We allow the optimizing compiler to remove preemption checks from
non-backtracking loops in the regexp code because those loops
unlike backtracking have guaranteed O(input_length) time
complexity.

Performance Implications
------------------------

This change improves performance of regexps in cases where regexp spends a lot
of time in the first invocation (either due to backtracking or due to long non
matching prefix) by allowing VM to optimize the :matcher while :matcher is
running.

For example on regex-redux[1] benchmark it improves Dart performance by 3x
(from ~18s to ~6s on my Mac Book Pro).

CL history
----------

This relands commit d87cc52c3e.

Original code review: https://codereview.chromium.org/2950783003/

[1] https://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexredux&lang=dart&id=2

R=erikcorry@google.com

Review-Url: https://codereview.chromium.org/2951053003 .
2017-06-23 12:51:53 +02:00
Dmitry Stefantsov 8d583bde5b Revert "VM(RegExp): Allow OSR optimization of RegExp :matcher functions."
This reverts commit d87cc52c3e.

TBR=vegorov@google.com

Review-Url: https://codereview.chromium.org/2947143002 .
2017-06-21 14:17:21 +02:00
Vyacheslav Egorov d87cc52c3e VM(RegExp): Allow OSR optimization of RegExp :matcher functions.
Previously these functions would only contain a single CheckStackOverflowInstr
in a backtracking block and that CheckStackOverflowInstr would have a zero
loop_depth - which means it would not be considered eligable for OSR.

This change:

* adds CheckStackOverflowInstr with non-zero loop_depth in two other places
  (Boyer-Moore lookahead skip loop and greedy loop) where loops arise in the
  generated IL;
* sets non-zero loop depth on the CheckStackOverflowInstr in the backtracking
  block;
* adds a flag on CheckStackOverflowInstr that allows optimizing compiler to
  optimize away those checks that were inserted solely to serve as OSR entries.

We allow the optimizing compiler to remove preemption checks from
non-backtracking loops in the regexp code because those loops
unlike backtracking have guaranteed O(input_length) time
complexity.

Performance Implications
------------------------

This change improves performance of regexps in cases where regexp spends a lot
of time in the first invocation (either due to backtracking or due to long non
matching prefix) by allowing VM to optimize the :matcher while :matcher is
running.

For example on regex-redux[1] benchmark it improves Dart performance by 3x
(from ~18s to ~6s on my Mac Book Pro).

[1] https://benchmarksgame.alioth.debian.org/u64q/program.php?test=regexredux&lang=dart&id=2

BUG=
R=erikcorry@google.com

Review-Url: https://codereview.chromium.org/2950783003 .
2017-06-21 13:46:48 +02:00
Vyacheslav Egorov 2403444eba VM: Optimize RegExp.matchAsPrefix(...) by generating a sticky RegExp specialization.
This is the same as a sticky RegExp flag in ES2015.

Overlay some RegExp fields on top of each other - given that they should never be used simultaneously.

BUG=http://dartbug.com/27810
R=rmacnak@google.com

Review URL: https://codereview.chromium.org/2510783002 .
2016-11-17 17:46:21 +01:00
Zachary Anderson a1bcf051d8 clang-format runtime/vm
R=johnmccutchan@google.com

Review URL: https://codereview.chromium.org/2481873005 .
2016-11-08 13:54:47 -08:00
Zachary Anderson 103881d01c Make header include guards great again
i.e. #ifndef VM_WHATEVER -> #ifndef RUNTIME_VM_WHATEVER

This lets us remove a hack from the PRESUBMIT.py script that existed
for reasons that are no longer valid, and sets us up to add some
presubmit checks for the GN build.

R=asiva@google.com, rmacnak@google.com

Review URL: https://codereview.chromium.org/2450713004 .
2016-10-26 00:26:03 -07:00
Ivan Posva 5be5d54529 - Refactor Symbol allocation to expect a thread parameter.
- Preallocate all tokens as symbols to avoid repeated lookups.
- Pass thread/zone where useful while doing this change.
- Avoid allocating symbols for error messages.

BUG=
R=fschneider@google.com

Review URL: https://codereview.chromium.org/1870343002 .
2016-04-11 16:28:29 -07:00
Ryan Macnak b070d98f6e Simpler regex names:
C++ JSRegExp -> RegExp
  Dart _JSSyntaxRegExp -> _RegExp
  Function RegExp.:irregexp: pattern -> RegExp.:matcher: pattern

R=johnmccutchan@google.com

Review URL: https://codereview.chromium.org/1815333002 .
2016-03-21 17:21:05 -07:00
Ryan Macnak d1475c0356 Port irregexp bytecode compiler and interpreter from V8 r24065.
R=srdjan@google.com

Review URL: https://codereview.chromium.org//1201383002 .
2015-07-07 14:43:32 -07:00
koda@google.com 9c181ec6d5 Thread/Isolate refactoring: new(Isolate*) -> new(Zone*)
Refactor all remaning cases where the current zone is used through new(Isolate*) and remove this interface.

Removing this interface is needed to move towards multiple threads per isolate, and also makes the caller more aware of the scope of the zone used, reducing the risk of use-after-free.

Make the current thread and the stack zone created around native/runtime entries directly available in their body, saving an indirection (and optimized away if unused).

R=iposva@google.com

Review URL: https://codereview.chromium.org//982873004

git-svn-id: https://dart.googlecode.com/svn/branches/bleeding_edge/dart@44541 260f80e4-7a28-3924-810f-c04153c831b5
2015-03-17 19:24:26 +00:00
zerny@google.com c77ab72c54 Use a typed array for the irregexp stack.
BUG=
R=vegorov@google.com

Review URL: https://codereview.chromium.org//800433004

git-svn-id: https://dart.googlecode.com/svn/branches/bleeding_edge/dart@43039 260f80e4-7a28-3924-810f-c04153c831b5
2015-01-21 15:11:36 +00:00
zerny@google.com 8e807c8550 Integrate the Irregexp Regular Expression Engine.
BUG=http://dartbug.com/19090
R=fschneider@google.com

Committed: https://code.google.com/p/dart/source/detail?r=41949

Review URL: https://codereview.chromium.org//744853003

git-svn-id: https://dart.googlecode.com/svn/branches/bleeding_edge/dart@41983 260f80e4-7a28-3924-810f-c04153c831b5
2014-11-26 09:32:43 +00:00
zerny@google.com 84a4135901 Revert "Integrate the Irregexp Regular Expression Engine."
This reverts commit https://code.google.com/p/dart/source/detail?r=41949

TBR=fschneider@google.com
BUG=

Review URL: https://codereview.chromium.org//754383002

git-svn-id: https://dart.googlecode.com/svn/branches/bleeding_edge/dart@41950 260f80e4-7a28-3924-810f-c04153c831b5
2014-11-25 13:36:40 +00:00
zerny@google.com 43ac0c6f33 Integrate the Irregexp Regular Expression Engine.
BUG=http://dartbug.com/19090
R=fschneider@google.com

Review URL: https://codereview.chromium.org//744853003

git-svn-id: https://dart.googlecode.com/svn/branches/bleeding_edge/dart@41949 260f80e4-7a28-3924-810f-c04153c831b5
2014-11-25 13:10:21 +00:00
zerny@google.com 94a510176b Copy irregexp related code from V8.
This "syntactic" change copies the irregexp code from V8 r24065 to Dart.
Compared to the V8 files the only changes are that the files have been
suitably renamed, headers and footers have been updated, and most of the
unneeded code has been replaced by a 'SNIP' marker. None of the files
are added to the build and they cannot compile. Integration with the
Dart VM is done in a follow-up CL [1].

[1]: https://codereview.chromium.org/683433003/

BUG=
R=iposva@google.com

Review URL: https://codereview.chromium.org//678193004

git-svn-id: https://dart.googlecode.com/svn/branches/bleeding_edge/dart@41455 260f80e4-7a28-3924-810f-c04153c831b5
2014-11-03 10:00:31 +00:00