From c87876763e88ddbe1d465912aff74ee4c0ffd451 Mon Sep 17 00:00:00 2001 From: Irit Katriel <1055913+iritkatriel@users.noreply.github.com> Date: Wed, 26 Jun 2024 13:18:20 +0100 Subject: [PATCH] gh-119786: move frames documentation to InternalDocs and add details (#121009) --- InternalDocs/README.md | 2 + .../frame_layout.md => InternalDocs/frames.md | 91 ++++++++----------- 2 files changed, 38 insertions(+), 55 deletions(-) rename Objects/frame_layout.md => InternalDocs/frames.md (64%) diff --git a/InternalDocs/README.md b/InternalDocs/README.md index 2918ead265d..95181a420f1 100644 --- a/InternalDocs/README.md +++ b/InternalDocs/README.md @@ -14,6 +14,8 @@ # CPython Internals Documentation [Compiler Design](compiler.md) +[Frames](frames.md) + [Adaptive Instruction Families](adaptive.md) [The Source Code Locations Table](locations.md) diff --git a/Objects/frame_layout.md b/InternalDocs/frames.md similarity index 64% rename from Objects/frame_layout.md rename to InternalDocs/frames.md index b348e85689f..34682adb1b4 100644 --- a/Objects/frame_layout.md +++ b/InternalDocs/frames.md @@ -1,51 +1,47 @@ -# The Frame Stack +# Frames -Each call to a Python function has an activation record, -commonly known as a "frame". -Python semantics allows frames to outlive the activation, -so they have (before 3.11) been allocated on the heap. -This is expensive as it requires many allocations and -results in poor locality of reference. - -In 3.11, rather than have these frames scattered about memory, -as happens for heap-allocated objects, frames are allocated -contiguously in a per-thread stack. -This improves performance significantly for two reasons: -* It reduces allocation overhead to a pointer comparison and increment. -* Stack allocated data has the best possible locality and will always be in - CPU cache. - -Generator and coroutines still need heap allocated activation records, but -can be linked into the per-thread stack so as to not impact performance too much. - -## Layout - -Each activation record consists of four conceptual sections: +Each call to a Python function has an activation record, commonly known as a +"frame". It contains information about the function being executed, consisting +of three conceptual sections: * Local variables (including arguments, cells and free variables) * Evaluation stack -* Specials: The per-frame object references needed by the VM: globals dict, - code object, etc. -* Linkage: Pointer to the previous activation record, stack depth, etc. +* Specials: The per-frame object references needed by the VM, including + globals dict, code object, instruction pointer, stack depth, the + previous frame, etc. -### Layout +The definition of the ``_PyInterpreterFrame`` struct is in +[Include/internal/pycore_frame.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_frame.h). -The specials and linkage sections are a fixed size, so are grouped together. +# Allocation + +Python semantics allows frames to outlive the activation, so they need to +be allocated outside the C call stack. To reduce overhead and improve locality +of reference, most frames are allocated contiguously in a per-thread stack +(see ``_PyThreadState_PushFrame`` in +[Python/pystate.c](https://github.com/python/cpython/blob/main/Python/pystate.c)). + +Frames of generators and coroutines are embedded in the generator and coroutine +objects, so are not allocated in the per-thread stack. See ``PyGenObject`` in +[Include/internal/pycore_genobject.h](https://github.com/python/cpython/blob/main/Include/internal/pycore_genobject.h). + +## Layout Each activation record is laid out as: -* Specials and linkage +* Specials * Locals * Stack This seems to provide the best performance without excessive complexity. -It needs the interpreter to hold two pointers, a frame pointer and a stack pointer. +The specials have a fixed size, so the offset of the locals is know. The +interpreter needs to hold two pointers, a frame pointer and a stack pointer. #### Alternative layout An alternative layout that was used for part of 3.11 alpha was: * Locals -* Specials and linkage +* Specials * Stack This has the advantage that no copying is required when making a call, @@ -53,19 +49,6 @@ #### Alternative layout location for the parameters. However, it requires the VM to maintain an extra pointer for the locals, which can hurt performance. -A variant that only needs the need two pointers is to reverse the numbering -of the locals, so that the last one is numbered `0`, and the first in memory -is numbered `N-1`. -This allows the locals, specials and linkage to accessed from the frame pointer. -We may implement this in the future. - -#### Note: - -> In a contiguous stack, we would need to save one fewer registers, as the -> top of the caller's activation record would be the same at the base of the -> callee's. However, since some activation records are kept on the heap we -> cannot do this. - ### Generators and Coroutines Generators and coroutines contain a `_PyInterpreterFrame` @@ -92,25 +75,23 @@ ### Frame objects The `PyFrameObject` may outlive a stack-allocated `_PyInterpreterFrame`. If it does then `_PyInterpreterFrame` is copied into the `PyFrameObject`, except the evaluation stack which must be empty at this point. -The linkage section is updated to reflect the new location of the frame. +The previous frame link is updated to reflect the new location of the frame. This mechanism provides the appearance of persistent, heap-allocated frames for each activation, but with low runtime overhead. ### Generators and Coroutines - -Generator objects have a `_PyInterpreterFrame` embedded in them. -This means that creating a generator requires only a single allocation, -reducing allocation overhead and improving locality of reference. -The embedded frame is linked into the per-thread frame when iterated or -awaited. +Generators (objects of type ``PyGen_Type``, ``PyCoro_Type`` or +``PyAsyncGen_Type``) have a `_PyInterpreterFrame` embedded in them, so +that they can be created with a single memory allocation. +When such an embedded frame is iterated or awaited, it can be linked with +frames on the per-thread stack via the linkage fields. If a frame object associated with a generator outlives the generator, then -the embedded `_PyInterpreterFrame` is copied into the frame object. - - -All the above applies to coroutines and async generators as well. +the embedded `_PyInterpreterFrame` is copied into the frame object (see +``take_ownership()`` in +[Python/frame.c](https://github.com/python/cpython/blob/main/Python/frame.c)). ### Field names @@ -119,7 +100,7 @@ ### Field names For example the `f_globals` field has a `f_` prefix implying it belongs to the `PyFrameObject` struct, although it belongs to the `_PyInterpreterFrame` struct. -We may rationalize this naming scheme for 3.12. +We may rationalize this naming scheme for a later version. ### Shim frames