mirror of
https://github.com/dart-lang/sdk
synced 2024-09-16 00:29:48 +00:00
[dart2js]: Update README
One pass to update the general description of the compiler pipeline. Change-Id: I0597958139e9ea11b27dcb6072b8d70a90c9c937 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/242505 Reviewed-by: Mayank Patke <fishythefish@google.com> Commit-Queue: Sigmund Cherem <sigmund@google.com>
This commit is contained in:
parent
4f0ed6a45c
commit
bf2cff83d5
|
@ -4,188 +4,101 @@ Welcome to the sources of the dart2js compiler!
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
The compiler is currently undergoing a long refactoring process. As you navigate
|
The compiler is structured to operate in several phases. By default these phases
|
||||||
this code you may find it helpful to understand how the compiler used to be,
|
are executed in sequence in a single process, but on some build systems, some of
|
||||||
where it is going, and where it is today.
|
these phases are split into separate processes. As such, there is plenty of
|
||||||
|
indirection and data representations used mostly for the purpose of serializing
|
||||||
|
intermediate results during compilation.
|
||||||
|
|
||||||
### The near future architecture
|
The current compiler phases are:
|
||||||
|
|
||||||
The compiler will operate in these general phases:
|
1. **common front-end**: Execute traditional front-end compilation phases.
|
||||||
|
Dart2js delegates to the common front-end (also used by DDC and the VM) to
|
||||||
|
do all front-end features, this includes:
|
||||||
|
* parsing Dart source code,
|
||||||
|
* type checking,
|
||||||
|
* inferring implicit user types, like locals with a `var` declaration,
|
||||||
|
* lowering or simplifying Dart features. For example, this is how many
|
||||||
|
syntactic features, like extension methods and list comprehensions, are
|
||||||
|
implemented.
|
||||||
|
* additional web-specific lowering or simplifications. For example,
|
||||||
|
expansion of JS-interop features and web specific implementation of
|
||||||
|
language features like late variables.
|
||||||
|
|
||||||
1. **load kernel**: Load all the code as kernel
|
The result of this phase is a kernel AST which is serialized as a `.dill`
|
||||||
* Collect dart sources transtively
|
file.
|
||||||
* Convert to kernel AST
|
|
||||||
|
|
||||||
(this will be handled by invoking the front-end package)
|
2. **modular analysis**: Using kernel as input, compute data recording
|
||||||
|
properties about each method in the program, especially around dependencies
|
||||||
|
and features they may need. We call this "impact data" (i1).
|
||||||
|
|
||||||
Alternatively, the compiler can start compilation directly from kernel files.
|
When the compiler runs as a single process, this is done lazily/on-demand
|
||||||
|
during the tree-shaking phase (below). However, this data can also be
|
||||||
|
computed independently for individual methods, files, or packages in the
|
||||||
|
application. That makes it possible to run this modularly and in parallel.
|
||||||
|
|
||||||
2. **model**: Create a Dart model of the program
|
The result of this phase can be emitted as files containing impact data in
|
||||||
* The kernel ASTs could be used as a model, so this might be a no-op or just
|
a serialized format.
|
||||||
creating a thin wrapper on top of kernel.
|
|
||||||
|
|
||||||
3. **tree-shake and create world**: Build world of reachable code
|
3. **tree-shake and create world**: Create a model to understand what parts of
|
||||||
* For each reachable piece of code:
|
the code are used by an application. This consists of:
|
||||||
* Compute impact (i1) from kernel AST
|
* creating an intermediate representation called the "K model" that
|
||||||
* Build a closed world (w1)
|
wraps our kernel representation
|
||||||
|
* calculating which classes and methods are considered live in the
|
||||||
|
program. This is done by incrementally combining impact data (i1)
|
||||||
|
starting from `main`, then visiting reachable methods in the program
|
||||||
|
with an Rapid Type Analysis (RTA) algorithm to aggregate impacts
|
||||||
|
together.
|
||||||
|
|
||||||
4. **analyze**: Run a global analysis
|
The result of this phase is what we call a "closed world" (w1). The closed
|
||||||
* Assume closed world semantics (from w1)
|
world is also a datastructure that can answer interesting queries, such as:
|
||||||
* Produce a global result (g)
|
Is this interface implemented by a single class? Is this method available
|
||||||
* Like today (g) will contain type and nullability information
|
in any stubtype of some interface? The answers to these questions can help
|
||||||
* After we adopt strong-mode types, we want to explore simplifying this
|
the compiler generate higher quality JavaScript.
|
||||||
to only contain native + nullability information.
|
|
||||||
|
|
||||||
5. **codegen model**: Create a JS model of the program
|
4. **global analysis**: Run a global analysis that assumes closed world
|
||||||
* Model JavaScript specific concepts (like the split of constructor bodies
|
semantics (from w1) and propagates information across method boundaries to
|
||||||
as separate elements) and provide a mapping to the Dart model
|
further understand what values flow through the program. This phase is
|
||||||
|
very valuable in narrowing down possibilities that are ambiguous based
|
||||||
|
solely on type information written by developers. It often finds
|
||||||
|
oportunities that enable the compiler to devirtualize or inline method
|
||||||
|
calls, generate code specializations, or trigger performance optimizations.
|
||||||
|
|
||||||
6. **codegen and tree-shake**: Generate code, as needed
|
The result of this phase is a "global result" (g).
|
||||||
* For each reachable piece of code:
|
|
||||||
* build ssa graph from kernel ASTs and global results (g)
|
5. **codegen model**: Create a JS or backend model of the program. This is an
|
||||||
* optimize ssa
|
intermediate representation of the entities in the program we referred to
|
||||||
|
as the "J model". It is very similar to the "K model", but it is tailored
|
||||||
|
to model JavaScript specific concepts (like the split of constructor bodies
|
||||||
|
as separate elements) and provide a mapping to the Dart model.
|
||||||
|
|
||||||
|
6. **codegen**: Generate code for each method that is deemed necessary. This
|
||||||
|
includes:
|
||||||
|
* build an SSA graph from kernel ASTs and global results (g)
|
||||||
|
* optimize the SSA representation
|
||||||
* compute impact (i2) from optimized code
|
* compute impact (i2) from optimized code
|
||||||
* emit JS ASTs for the code
|
* emit JS ASTs for the code
|
||||||
* Build a codegen closed world (w2) from new impacts (i2)
|
|
||||||
|
|
||||||
7. **emit**: Assemble and minify the program
|
|
||||||
* Build program structure from the compiled pieces (w2)
|
7. **link tree-shake**: Using the results of codegen, we perform a second
|
||||||
|
round of tree-shaking. This is important because code that was deemed
|
||||||
|
reachable in (w1) may be found unreachable after optimizations. The process
|
||||||
|
is very similar to the earlier phase: we combine incrementally the codegen
|
||||||
|
impact data (i2) and compute a codegen closed world (w2).
|
||||||
|
|
||||||
|
|
||||||
|
When dart2js runs as a single process the codegen phase is done lazily and
|
||||||
|
on-demand, together with the tree-shaking phase.
|
||||||
|
|
||||||
|
8. **emit JavaScript files**: The final step is to assemble and minify the
|
||||||
|
final program. This includes:
|
||||||
|
* Build a JavaScript program structure from the compiled pieces (w2)
|
||||||
* Use frequency namer to minify names.
|
* Use frequency namer to minify names.
|
||||||
* Emit js and source map files.
|
* Emit js and source map files.
|
||||||
|
|
||||||
### The old architecture
|
## Code organization
|
||||||
|
|
||||||
The compiler used to operate as follows:
|
### Some terminology used in the compiler
|
||||||
|
|
||||||
1. **load dart**: Load all source files
|
|
||||||
* Collect dart sources transtively
|
|
||||||
* Scan enough tokens to build import dependencies.
|
|
||||||
|
|
||||||
2. **model**: Create a Dart model (aka. Element Model) of the program
|
|
||||||
* Do a diet-parse of the program to create the high-level element model
|
|
||||||
|
|
||||||
3. **resolve and tree-shake**: Resolve and build world of reachable code (the
|
|
||||||
resolution enqueuer)
|
|
||||||
* For each reachable piece of code:
|
|
||||||
* Parse the full body of the function
|
|
||||||
* Resolve it and enqueue other pieces that are reachable
|
|
||||||
* Type check the body of the function
|
|
||||||
|
|
||||||
4. **analyze**: Run a global analysis
|
|
||||||
* Assume closed world semantics (from everything enqueued by the resolver)
|
|
||||||
* Produce a global result about type and nullability information of method
|
|
||||||
arguments, return values, and receivers of dynamic sends.
|
|
||||||
|
|
||||||
5. **codegen and tree-shake**: Generate code, as needed (via the codegen
|
|
||||||
enqueuer)
|
|
||||||
* For each reachable piece of code:
|
|
||||||
* build ssa graph from resolved source ASTs global results (g)
|
|
||||||
* optimize ssa
|
|
||||||
* enqueue visible dependencies
|
|
||||||
* emit js asts for the code
|
|
||||||
|
|
||||||
6. **emit**: Assemble and minify the program
|
|
||||||
* Build program structure from the compiled pieces
|
|
||||||
* Use frequency namer to minify names.
|
|
||||||
* Emit js and source map files.
|
|
||||||
|
|
||||||
### The architecture today (which might be changing while you read this!)
|
|
||||||
|
|
||||||
When using the `--use-kernel` flag, you can test the latest state of the
|
|
||||||
compiler as we are migrating to the new architecture. Currently it works as
|
|
||||||
follows:
|
|
||||||
|
|
||||||
1. **load dart**: (same as old compiler)
|
|
||||||
|
|
||||||
2. **model**: (same element model as old compiler)
|
|
||||||
|
|
||||||
3. **resolve, tree-shake and build world**: Build world of reachable code
|
|
||||||
* For each reachable piece of code:
|
|
||||||
* Parse full body of the function
|
|
||||||
* Resolve it from the parsed source ASTs
|
|
||||||
* Type check it (same as old compiler)
|
|
||||||
* Compute impact (i1) from resolved source ASTs (no kernel)
|
|
||||||
* Build a closed world (w1)
|
|
||||||
|
|
||||||
4. **kernelize**: Create kernel ASTs
|
|
||||||
* For all resolved elements in w1, compute their kernel representation using
|
|
||||||
the `rasta` visitor.
|
|
||||||
|
|
||||||
5. **analyze**: (almost same as old compiler)
|
|
||||||
|
|
||||||
6. **codegen and tree-shake**: Generate code, as needed
|
|
||||||
* For each reachable piece of code:
|
|
||||||
* build ssa graph from kernel ASTs (uses global results g)
|
|
||||||
* optimize ssa
|
|
||||||
* compute impact (i2) from optimized code
|
|
||||||
* emit js asts for the code
|
|
||||||
* Build a codegen closed world (w2) from new impacts (i2)
|
|
||||||
|
|
||||||
7. **emit**: (same as old compiler)
|
|
||||||
|
|
||||||
Some additional details worth highlighting:
|
|
||||||
|
|
||||||
* tree-shaking is close to working as we want: the notion of a world and world
|
|
||||||
impacts are computed explicitly:
|
|
||||||
|
|
||||||
* In the old compiler, the resolver and code generator directly
|
|
||||||
enqueued items to be processed, there was no knowledge of what had
|
|
||||||
to be done other than in the algorithm itself.
|
|
||||||
|
|
||||||
* Now the information is computed explicitly in two ways:
|
|
||||||
|
|
||||||
* The dependencies of a single element are computed as an "impact"
|
|
||||||
object, these are derived from the structure of the
|
|
||||||
code (either the resolved code or the generated code).
|
|
||||||
|
|
||||||
* The closed world is now an explicit concept that can be replaced in the
|
|
||||||
compiler.
|
|
||||||
|
|
||||||
* This allows us to delete the resolver in the future and replace it
|
|
||||||
with a kernel loader, an impact builder from kernel, and a kernel world.
|
|
||||||
|
|
||||||
* There is an implementation of a kernel impact builder, but it is not yet
|
|
||||||
in use in the compiler pipeline (gated on replacing the Dart model)
|
|
||||||
|
|
||||||
* We still depend on the Dart model computed by resolution, but progress has
|
|
||||||
been made introducing an abstraction common to the new and old models. The
|
|
||||||
old model is the "Element model", the generic abstraction is called the
|
|
||||||
"Entity model". Some portions of the compiler now refer to the entity model.
|
|
||||||
|
|
||||||
* The ssa graph is built from the kernel ASTs, but it still depends on the old
|
|
||||||
element model computed from resolution (accessed via a kernel2Ast adapter).
|
|
||||||
The graph builder implementation covers a large chunk of the language
|
|
||||||
features, but is not complete (89% of langage & corelib tests are passing).
|
|
||||||
|
|
||||||
* Global analysis is still working on top of the dart2js ASTs.
|
|
||||||
|
|
||||||
## Code organization and history
|
|
||||||
|
|
||||||
The compiler package was initially intended to be compiler for multiple targets:
|
|
||||||
Javascript, Dart (dart2dart), and dartino bytecodes. It has now evolved to be a
|
|
||||||
Javascript only compiler, but some of the abstractions to support multiple
|
|
||||||
targets still remain.
|
|
||||||
|
|
||||||
### Possibly confusing terminology
|
|
||||||
|
|
||||||
Some of the terminology in the compiler is confusing without knowing its
|
|
||||||
history. We are cleaning this up as we are rearchitecting the system, but here
|
|
||||||
are some of the legacy terminology we have:
|
|
||||||
|
|
||||||
* **target**: the output the compiler is producing. Nowdays it just
|
|
||||||
JavaScript, but in the past there was also Dart and dartino bytecodes.
|
|
||||||
|
|
||||||
* **backend**: pieces of the compiler that were target-specific.
|
|
||||||
Note: in the past we've used the term *backend* also for code that is used
|
|
||||||
in the frontend of the compiler that happens to be target-specific, as well
|
|
||||||
as and code that is used in the emitter or what traditionally is known
|
|
||||||
as the backend of the compiler.
|
|
||||||
|
|
||||||
* **frontend**: the parser, resolver, and other early stages of the compiler.
|
|
||||||
The front-end however makes target-specific choices. For example, to compile
|
|
||||||
a program with async-await, the dart2js backend needs to include some helper
|
|
||||||
functions that are used by the expanded async-await code, these helpers need
|
|
||||||
to be parsed by the frontend and added to the compilation pipeline.
|
|
||||||
|
|
||||||
* **world**: the compiler exploits closed-world assumptions to do
|
* **world**: the compiler exploits closed-world assumptions to do
|
||||||
optimizations. The *world* encapsulates some of our knowledge of the
|
optimizations. The *world* encapsulates some of our knowledge of the
|
||||||
|
@ -201,29 +114,22 @@ are some of the legacy terminology we have:
|
||||||
|
|
||||||
* **model**: there are many models in the compiler:
|
* **model**: there are many models in the compiler:
|
||||||
|
|
||||||
* **element model**: this is an abstraction describing the elements seen in
|
* **entity model**: this is an abstraction describing the elements seen in
|
||||||
Dart programs, like "libraries", "classes", "methods", etc.
|
Dart programs, like "libraries", "classes", "methods", etc. We currently
|
||||||
|
have two entity models, the "K model" (which is frontend centric and
|
||||||
* **entity model**: also describes elements seen in Dart programs, but it is
|
usually maps 1:1 with kernel entities) and the "J model" (which is backend
|
||||||
meant to be minimalistic and a super-hierarchy above the *element models*.
|
centric).
|
||||||
This is a newer addition, is an added abstraction to make it possible to
|
|
||||||
refactor our code from our old frontend to the kernel frontend.
|
|
||||||
|
|
||||||
* **Dart vs JS models**: the compiler in the past had a single model to
|
|
||||||
describe elements in the source and elements that were being compiled. In
|
|
||||||
the future we plan to have two. Both input model and output models will be
|
|
||||||
implementations of the *entity model*. The JS model is intended to have
|
|
||||||
concepts specific about generating code in JS (like constructor-bodies as
|
|
||||||
a separate entity than the constructor, closure classes, etc).
|
|
||||||
|
|
||||||
* **emitter model**: this is a model just used for dumping out the structure
|
* **emitter model**: this is a model just used for dumping out the structure
|
||||||
of the program in a .js text file. It doesn't have enough semantic meaning
|
of the program in a .js text file. It doesn't have enough semantic meaning
|
||||||
to be a JS model for compilation at this moment.
|
to be a JS model for compilation, which is why there is a separate "J
|
||||||
|
model".
|
||||||
|
|
||||||
* **enqueuer**: a work-queue used to achieve tree-shaking (or more precisely
|
* **enqueuer**: a work-queue used to achieve tree-shaking (or more precisely
|
||||||
tree-growing): elements are added to the enqueuer as we recognize that they
|
tree-growing): elements are added to the enqueuer as we recognize that they
|
||||||
are needed in a given application. Note that we even track how elements are
|
are needed in a given application (as described by the impact data). Note
|
||||||
used, since some ways of using an element require more code than others.
|
that we even track how elements are used, since some ways of using an
|
||||||
|
element require more code than others.
|
||||||
|
|
||||||
### Code layout
|
### Code layout
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue