From 08cacea7d579740f5407d198cb0fed994ab7e2b8 Mon Sep 17 00:00:00 2001 From: Andrew Kaster Date: Thu, 30 Nov 2023 17:10:00 -0700 Subject: [PATCH] Documentation: Begin document on execution and navigation in LibWeb --- .../Browser/ | 156 ++++++++++++++++++ 1 file changed, 156 insertions(+) create mode 100644 Documentation/Browser/ diff --git a/Documentation/Browser/ b/Documentation/Browser/ new file mode 100644 index 0000000000..d67f8c07a9 --- /dev/null +++ b/Documentation/Browser/ @@ -0,0 +1,156 @@ +# LibWeb: Browsing Contexts and Navigables + +**NOTE: This document is a work in progress!** + +## Introduction: How does code execute, really? + +Before we can dive into how LibWeb and Ladybird implement the HTML web page navigation operations, +we need to dive into some fundamental specification concepts. Starting with, how does code actually +execute in a (possibly virtual) machine? Next we'll look at what that means for the ECMAScript +Specification (JavaScript), and finally how the ECMAScript code execution model ties into the +HTML specification to model how to display web content into a browser tab. + +### Native Code Execution: A Primer + +When modeling the execution of a native program written in a popular systems language like +C, C++, or Rust, most systems programmers should be familiar with the concepts of *threads* +and *processes*. In a "hosted" environment, the execution of one's userspace program generally +starts with an underlying operating system creating a process for the application to run in. +This process will contain a memory space for program data and code to live in, and an initial, +or main thread to start execution on. In order for the operating system to change which +thread is executing on a particular CPU core, it needs to save and restore the *Execution Context* +for that thread. The Execution Context for a native thread generally consists of a set of +CPU registers, any floating point state, a program counter that tracks which instruction should +be loaded next, and a stack pointer that points to the local data the thread was using to track +its function call stack and local variables. The programmer can also request additional threads +through a system call, providing a set of thread attributes and a function to call as the entry +point. + +For traditional compiled programs, the concept of accessing variables and functions is split into +two phases. At compile time, local variables and arguments are folded into stack slots and +allocated into registers. Exported variables and functions are written into the executable object +file (ELF, Mach-O, PE, etc.) and are visible to external tools as symbols, as referenced by a +symbol table contained within the object file format. Normally local variable and argument +names and locations are lost in the compile+link steps, but the compiler can be configured to +emit extra debug information to allow debuggers to access and modify them at runtime. In order +to support something like the dynamic imports of interpreted languages, the programmer has to +call a platform-specific function to load the new module (e.g. ``dlopen`` or ``LoadLibrary``). +But after the module is opened, in order to actually refer to any exported symbols from that module the +programmer has to retrieve the address of each symbol through another platform specific function +(e.g. ``dlsym`` or ``GetProcAddress``), once per symbol. + +### ECMAScript Execution Model: Realms and Agents + +The ECMAScript specification has analogs for almost all of these concepts in the section on +[Executable Code and Execution Contexts]( + +Working in the other direction from the native code explanation, ECMAScript describes the accessibility +and scopes of functions, variables, and arguments in terms of [*Environment Records*]( +Note that these Environment Records are not actually visible to executing code, and are simply a mechanism +used by the specification authors to model the language. Every function and module has a type +of Environment Record that contains the variables, functions, catch clause bindings, and other +language constructs that affect which names are visible at any location in the code. These Environment Records +are nested, in a tree-like structure that somewhat matches the Abstract Syntax Tree (AST). + +The root of the tree of Environment Records is the Global Environment Record, which corresponds to the +Global Object and its properties. In JavaScript, there is always a ``this`` value representing the current +object context. At global scope, the Global Object normally takes that responsibility. In a REPL, that might +be some REPL specific global object that has global functions to call for doing things like loading +from the filesystem, or even be as complex as Node or Bun. In a Browser context, the Global object is +normally the Window, unless there's a Worker involved. For historical reasons the global ``this`` binding for +Window contexts is actually a WindowProxy that wraps the Window. This concept is quite different from a native +executable, where there's no actual object representing the global scope, simply symbols that the +linker and loader make available to each module. + +While the Global Object and its Global Environment represent the root of the tree of identifiers visible +to the executing JavaScript code, the Global Object isn't sufficient to model all the state around +a conceptual thread of execution in ECMAScript. This is where the two concepts of [*Realms*]( +and [*Execution Contexts*]( come into play. +A [*Realm Record*]( is a container that holds a global object, +its associated Global Environment, a set of intrinsic objects, and any *host* (also called an *embedder* +in some specification documents) defined extra state that needs to be associated with the realm. +In LibWeb, the Host Defined slot holds an object that has the HTML Environment Settings Object for each realm, +as well as all the prototypes, constructors, and namespaces that need to be exposed on the Global Object +for Web APIs. On top of the Realm abstraction, ECMAScript uses the Execution Context to model the state +of execution of one particular script or module. Each Execution Context belongs to an [*execution context stack*]( +with the topmost context named the [*running execution context*]( +An Execution Context has information about the current function, the script or module that the current code block belongs to, +additional Environment Records required to access names in the current scope, any running generator state, +and most importantly to the thread analogy, the state needed to suspend and resume execution of that script. +As with Environment Records, Realms and Execution contexts are not directly accessible to running JavaScript code. + +The final missing piece for the JavaScript execution model is how these stacks of Execution Contexts +are actually scheduled to run by the ECMAScript implementation. In the most common case, this means directly +mapping the ECMAScript model to the earlier native concepts of threads and processes in a way that +allows for flexibility in the implementation strategies. The last thing that the specification authors want +to do is constrain implementations so much that innovation and experimentation becomes impossible. +The method for this mapping is the two related specification mechanisms [*Agents*]( +and [*Agent Clusters*]( The Execution Context stack mentioned +above actually belongs to an Agent, which holds said stack, a set of metadata about the memory model, +and a shared reference to an [*executing thread*]( +According to ECMAScript, there should always be at least one Execution Context on the stack, to allow concepts +such as the running execution context to always refer to the topmost Execution Context of the [*surrounding agent*]( +However, the HTML specification opts to remove the default execution context from the execution context stack +at creation, and instead manually pushes and pops execution contexts for script, module, and callback execution. +The relationship between Realms and Agents is not 1-1, but N-1. In the ECMAScript specification, this manifests +as a part of the [*Shadow Realm proposal*](, while the Web platform +requires multiple Realms per Agent to specify the historical behavior of ``