...instead of allocating separate BorderRadiusCornerClipper for each
executed sample/blit commands pair.
With this change a vector of BorderRadiusCornerClipper has far fewer
items. For example on twitter profile page its size goes down from
~3000 to ~3 items.
Before, this check was needed to prevent crashing when attempting to
allocate zero-size bitmap for sampled corners, which could have happened
if a corner had 0 radius in one axis.
Now, since SampleUnderCorners command is not emmited when radius is 0
in one axis, this check is no longer needed.
This reverts commit 6b7b9ca1c4.
The whole corner radius is invisible if it has 0 radius in any axis, so
the reverted commit was a mistake that led to error checking during
painting command execution b61aab66d9 to
avoid crashing on attempt to allocate 0 size bitmap.
Deflate and WebP can store at most 15 bits per symbol, meaning their
huffman trees can be at most 15 levels deep.
During construction, when we hit this level, we used to try again
with an ever lower frequency cap per symbol. This had the effect
of giving the symbols with the highest frequency lower frequencies
first, causing the most-frequent symbols to be merged. For example,
maybe the most-frequent symbol had 1 bit, and the 2nd-frequent
two bits (and everything else at least 3). With the cap, the two
most frequent symbols might both have 2 symbols, freeing up bits
for the lower levels of the tree.
This has the effect of making the most-frequent symbols longer at
first, which isn't great for file size.
Instead of using a frequency cap, ignore ever more of the low
bits of the frequency. This sacrifices resolution where it hurts
the lower levels of the tree first, and those are stored less
frequently.
For deflate, the 64 kiB block size means this doesn't have a big
effect, but for WebP it can have a big effect:
sunset-retro.png (876K): 2.02M -> 1.73M -- now (very slightly) smaller
than twice the input size! Maybe we'll be competitive one day.
(For wow.webp and 7z7c.webp, it has no effect, since we don't hit
the "tree too deep" case there, since those have relatively few
colors.)
No behavior change other than smaller file size. (No performance
cost either, and it's less code too.)
For our deflate, block size is limited to less than 64 kiB, so the sum
of all byte frequencies always fits in a u16 by construction.
But while I haven't hit this in practice, but it can conceivably happen
when writing WebP files, which currently use a single huffman tree
(per channel) for a while image -- which is often much larger than
64 kiB.
No dramatic behavior change in practice, just feels more correct.
This implements some of basic webp compression: Huffman coding.
(The other parts of the basics are backreferences, and color cache
entries; and after that there are the four transforms -- predictor,
subtract green, color indexing, color.)
How much huffman coding helps depends on the input's entropy.
Constant-color channels are now encoded in constant space, but
otherwise a huffman code always needs at least one bit per symbol.
This means just huffman coding can at the very best reduce output
size to 1/8th of input size.
For three test input files:
sunset-retro.png (876K): 2.25M -> 2.02M
(helps fairly little; from 2.6x as big as the png input to 2.36x)
giphy.gif (184k): 11M -> 4.9M
(pretty decent, from 61x as big as the gif input to 27x as big)
7z7c.gif (11K): 775K -> 118K
(almost as good as possible an improvement for just huffman coding,
from 70x as big as the gif input to 10.7x as big)
No measurable encoding perf impact for encoding.
The code is pretty similar to Deflate.cpp in LibCompress, with just
enough differences that sharing code doesn't look like it's worth
it to me. I left comments outlining similarities.
To be used in WebPWriter.
JPEGWriter currently hardcodes huffman tables; maybe it can use this
to build data-dependent huffman tables in the future as well.
Pure code move (except for removing the `DeflateCompressor::` prefix
on the function's name, and putting the default argument for the 4th
argument in the function's definition), no behavior change.
EventSource allows opening a persistent HTTP connection to a server over
which events are continuously streamed.
Unfortunately, our test infrastructure does not allow for automating any
tests of this feature yet. It only works with HTTP connections.
Ran into a crash here while testing LibProtocol changes. The method we
invoke here (did_progress) already accepts an Optional, and handles when
that Optional is empty. So there's no need to assume `total_size` is
non-empty.
Supporting unbuffered fetches is actually part of the fetch spec in its
HTTP-network-fetch algorithm. We had previously implemented this method
in a very ad-hoc manner as a simple wrapper around ResourceLoader. This
is still the case, but we now implement a good amount of these steps
according to spec, using ResourceLoader's unbuffered API. The response
data is forwarded through to the fetch response using streams.
This will eventually let us remove the use of ResourceLoader's buffered
API, as all responses should just be streamed this way. The streams spec
then supplies ways to wait for completion, thus allowing fully buffered
responses. However, we have more work to do to make the other parts of
our fetch implementation (namely, Body::fully_read) use streams before
we can do this.
This adds an alternate API to ResourceLoader to load HTTP/HTTPS/Gemini
requests unbuffered. Most of the changes here are moving parts of the
existing ResourceLoader::load method to helper methods so they can be
re-used by the new ResourceLoader::load_unbuffered.
LibWeb will need to use unbuffered requests to support server-sent
events. Connection for such events remain open and the remote end sends
data as HTTP bodies at its leisure. The browser needs to be able to
handle this data as it arrives, as the request essentially never
finishes.
To support this, this make Protocol::Request operate in one of two
modes: buffered or unbuffered. The existing mechanism for setting up a
buffered request was a bit awkward; you had to set specific callbacks,
but be sure not to set some others, and then set a flag. The new
mechanism is to set the mode and the callbacks that the mode needs in
one API.
This is to avoid including any LibProtocol header in Objective-C source
files, which will cause a conflict between the Protocol namespace and a
@Protocol interface.
See Ladybird/AppKit/Application/ApplicationBridge.cpp for why this
conflict unfortunately cannot be worked around.
All painting commands except SetClipRect are shifted by scroll offset
before command list execution. This change removes scroll offset
translation for sample/blit corner commands in
`PaintableWithLines::paint` so it is only applied once in
`CommandList::apply_scroll_offsets()`.
Currently, these only work when there are no CSS transforms (as the
stacking context painting is not set up to handle that case yet). This
is still enough to get most chat/comment markers working on GitHub
though :^)
The WebAssembly spec never relies on host system information, like
size_t. For consistency's sake, we should stick to the usage of u32's
instead of size_t's. This didn't cause issues before because
LEB128-encoded u64's are a superset of LEB128-encoded u32's.
- Less allocations
- Optimization pass that skips unnecessary sample/blit corner commands
could be more effecient by ignoring corner radius clipping for empty
painting commands
Both page target bitmap and mask bitmap are always BGRA8888, so we can
use `get_pixel<StorageFormat::BGRA8888>` and
`set_pixel<StorageFormat::BGRA8888>` to avoid branching by not
checking a bitmap format.
The if statement in the dispatch implies we are in the idle state, so of
course the active time will always be undefined. If this was cancelled
via a call to cancel(), we can save the time at that point. Otherwise,
just send 0.
I saw a null pointer dereference here on GitHub once, but don't know how
to reproduce, or how we'd get here. Nevertheless, null-checking the
navigable is reasonable so let's do it.
...that maintains a list of allocated execution contexts so malloc()
does not have to be called every time we need to get a new one.
20% improvement in Octane/typescript.js
16% improvement in Kraken/imaging-darkroom.js
This patch fixes two issues:
- Animation events that should go to the target element now do
(some were previously being dispatched on the animation itself.)
- We update the "previous phase" and "previous iteration" fields of
animation effects, so that we can actually detect phase changes.
This means we stop thinking animations always just started,
something that caused each animation to send 60 animationstart
events every second (to the wrong target!)
We enqueue a microtask for this readable stream AO, and the methods that
we call from the chunk steps manipulate promises. As such, we need to
enable callbacks for the entire microtask's temporary execution
context.
By using separate struct we can avoid updating AST node and
ECMAScriptFunctionObject constructors every time there is a need to
add or remove some additional information colllected during parsing.
As it's not uncommon for users to drop the module instance on the floor
after having grabbed the few exports they need to hold on to.
Fixes a few UAFs that show up as "invalid" accesses to
memory/tables/etc.
Adds a TextEditor widget to the chess application to display move
history. It is updated whenever Board::apply_move is called to reflect
the current board state.
Allows to skip function environment allocation for non-arrow functions
if the only reason it is needed is to hold `this` binding.
The parser is changed to do following:
- If a function is an arrow function and uses `this` then all functions
in a scope chain are marked to allocate function environment for
`this` binding.
- If a function uses `new.target` then all functions in a scope chain
are marked to allocate function environment.
`ordinary_call_bind_this()` is changed to put `this` value in execution
context when function environment allocation is skipped.
35% improvement in Octane/typescript.js
50% improvement in Octane/deltablue.js
19% improvement in Octane/raytrace.js
This reverts commit a362c37c8b.
The commit tried to add an unused function that continued our tradition
of not properly waiting for things but instead flooding event loop with
condition checks that delay firing of the event. I think this is a
fundamentally flawed approach. See also checks for
`fire_when_not_visible` in LibCore/EventLoopImplementationUnix.cpp.
This change adds the `width` and `height` properties to
`HTMLVideoElement` and `HTMLSourceElement`. These properties reflect
their respective content attribute values.
With this change, ".*make.*" function family now does error checking
earlier, which improves experience while using clangd. Note that the
change also make them instantiate classes a bit more eagerly, so in
LibVideo/PlaybackManager, we have to first define SeekingStateHandler
and only then make() it.
Co-Authored-By: stelar7 <dudedbz@gmail.com>
A connection's socket is allowed to be null if it's being recreated
after a failed connect(), this was handled correctly in
request_did_finish but not in recreate_socket_if_needed; this commit
fixes this oversight.
This allows us to skip allocating a function environment in cases where
it was previously impossible because the arguments object needed a
binding.
This change does not bring visible improvement in Kraken or Octane
benchmarks but seems useful to have anyway.
ReadLoopReadRequest::on_chunk expects an UInt8Array, so make sure we
convert the passed in ByteBuffer to an UInt8Array before passing it to
the AO writable_stream_default_writer_write.
Co-authored-by: Timothy Flynn <trflynn89@pm.me>
This callback is meant to be triggered by streams, which does not always
provide a WebIDL::DOMException. Pass a plain value instead. Of all the
users of this callback, only one actually uses the value, and already
converts the DOMException to a plain value.
This moves both Gfx::CanonicalCode::write_symbol() and
Compress::CanonicalCode::write_symbol() inline.
It also adds `__attribute__((always_inline))` on the arguments
to visit() in the latter. (ALWAYS_INLINE doesn't work on lambdas.)
Numbers with `ministat`: I ran once:
Build/lagom/bin/image -o test.bmp Base/res/wallpapers/sunset-retro.png
and then ran to bench:
~/src/hack/bench.py -n 20 -o bench_foo1.txt \
Build/lagom/bin/image -o test.webp test.bmp
...and then `ministat bench_foo1.txt bench_foo2.txt` to compare.
The previous commit increased the time for this command by 38% compared
to the before state.
With this, it's an 8.6% regression. So still a regression, but a smaller
one.
Or, in other words, this commit reduces times by 21% compared to the
previous commit.
Numbers with hyperfine are similar -- with this on top of the previous
commit, this is a 7-11% regression, instead of an almost 50% regression.
(A local branch that changes how we compute CanonicalCodes so that we
actually compress a bit is perf-neutral since the image writing code
doesn't change.)
`hyperfine 'image -o test.webp test.bmp'`:
* Before: 23.7 ms ± 0.7 ms (116 runs)
* Previous commit: 33.2 ms ± 0.8 ms (82 runs)
* This commit: 25.5 ms ± 0.7 ms (102 runs)
`hyperfine 'animation -o wow.webp giphy.gif'`:
* Before: 85.5 ms ± 2.0 ms (34 runs)
* Previous commit: 127.7 ms ± 4.4 ms (22 runs)
* This commit: 95.3 ms ± 2.1 ms (31 runs)
`hyperfine 'animation -o wow.webp 7z7c.gif'`:
* Before: 12.6 ms ± 0.6 ms (198 runs)
* Previous commit: 16.5 ms ± 0.9 ms (153 runs)
* This commit: 13.5 ms ± 0.6 ms (186 runs)
We still construct the code length codes manually, and now we also
construct a PrefixCodeGroup manually that assigns 8 bits to all
symbols (except for fully-opaque alpha channels, and for the
unused distance codes, like before). But now we use the CanonicalCodes
from that PrefixCodeGroup for writing.
No behavior change at all, the output is bit-for-bit identical to
before. But this is a step towards actually huffman-coding symbols.
This is however a pretty big perf regression. For
`image -o test.webp test.bmp` (where test.bmp is retro-sunset.png
re-encoded as bmp), time goes from 23.7 ms to 33.2 ms.
`animation -o wow.webp giphy.gif` goes from 85.5 ms to 127.7 ms.
`animation -o wow.webp 7z7c.gif` goes from 12.6 ms to 16.5 ms.
No behavior change. No measurable performance different either.
(I tried `hyperfine 'Build/lagom/bin/image --no-output foo.webp'`
for a few input images before and after this change, and I didn't
see a difference. I also tried if moving both
Gfx::CanonicalCode::read_symbol() and
Compress::CanonicalCode::read_symbol() inline, and that didn't
help either.)
By doing this, we can remove all the special-case checks for `length`
from the generic GetById and GetByIdWithThis code paths, making every
other property lookup a bit faster.
Small progressions on most benchmarks, and some larger progressions:
- 12% on Octane/crypto.js
- 6% on Kraken/ai-astar.js
This returns the secondary target of a mouse event. For `onmouseenter`
and `onmouseover` events, this is the EventTarget the mouse exited
from. For `onmouseleave` and `onmouseout` events, this is the
EventTarget the mouse entered to.
Most of IPC::Connection is thread-safe, but the responsiveness timer is
very much not so, this commit makes sure only one thread can send stuff
through IPC to avoid having threads racing in IPC::Connection.
This is simply less work than making sure everything IPC::Connection
uses (now and later) is thread-safe.
Previously RS handled all the requests in an event loop, leading to
issues with connections being started in the middle of other connections
being started (and potentially blowing up the stack), ultimately causing
requests to be delayed because of other requests.
This commit reworks the way we handle these (specifically starting
connections) by first serialising the requests, and then performing them
in multiple threads concurrently; which yields a significant loading
performance and reliability increase.
Previously sharing a Timer/Notifier between threads (or just handing
its ownership to another thread) lead to a crash as they are
thread-specific.
This commit makes it so we can handle mutation (i.e. just deletion
or unregistering) in a thread-safe and lockfree manner.
The VERIFY assumed that we either have a finally block which we already
visited through a previous boundary or have no finally block what so
ever; But since 301a1fc763 we propagate
finalizers through unwind scopes breaking that assumption.
I saw that this not being implemented was causing a javascript exception
to be thrown when loading https://profiler.firefox.com/.
I was hoping that like the last time that partially implementing this
interface would allow the page to load further, but it seems that this
is unfortunately not the case this time!
However, this may help other pages load slightly further, and puts some
of the scaffolding in place for a proper implementation :^)
For now, this slot is always 0 - (the default value per spec). But
once we start actually processing audio streams this internal slot
should be changed correspondingly.
This will help us in detecting potential web compatability issues from
not having this implemented.
While we're at it, update the spec link, as it was moved from the DOM
parsing spec to the HTML one, and implement this function in a manner
that closr resembles spec text.
To determine the palette of colors we use the median cut algorithm.
While being a correct implementation, enhancements are obviously
existing on both the median cut algorithm and the encoding side.
This is useful to find the best matching color palette from an existing
bitmap. It can be used in PixelPaint but also in encoders of old image
formats that only support indexed colors e.g. GIF.
It wasn't safe to use addition_would_overflow(a, -b) to check if
subtraction (a - b) would overflow, since it doesn't cover this case.
I don't know why we didn't have subtraction_would_overflow(), so this
patch adds it. :^)
Previously, when looking for the labeled control of a label element, we
were only checking its child elements. The specification says we should
check all elements in the same tree as the label element.
* Matches how the loader is organized
* `compress_VP8L_image_data()` will grow longer when we add actual
compression
* Maybe someone wants to write a lossy compressor one day
No behavior change.
This code path now also compresses to memory once, and then writes to
the output stream.
Since the animation writer has a SeekableStream, it could compress to
the stream directly and fix up offsets later. That's more complicated
though, and keeping the animated and non-animated code paths similar
seems nice. And the drawback is just temporary higher memory use, and
the used memory is smaller than the memory needed by the input bitmap.
Before, we used to compress the image data to memory, then make another
copy to memory, and then write to the output stream.
Now, we compress to memory once and then write to the output stream.
No behavior change.
It is now possible to pass an optional `ImageDataSettings` object to
the `CanvasImageData.createImageData()` and
`CanvasImageData.getImageData()` methods.
We'll want to explicitly load fonts from FontFace and other Web APIs
in the future. A future refactor should also move this completely away
from StyleComputer and call it something like 'FontCache'.
With this only `ContinuePendingUnwind` needs to dynamically check if a
scheduled return needs to go through a `finally` block, making the
interpreter loop a bit nicer
This actually allows us to re-introduce the ldd utility as a symlink to
our dynamic loader, so now ldd behaves exactly like on Linux - it will
load all dynamic dependencies for an ELF exectuable.
This has the advantage that running ldd on an ELF executable will
provide an exact preview of how the order in which the dynamic loader
loads the executable and its dependencies.
As a preparation to introducing ldd as a symlink to /usr/lib/Loader.so
we rename the ldd utility to be elfdeps, at its sole purpose is to list
ELF object dependencies, and not how the dynamic loader loads them.
This is useful for testing ELF binaries that expose other functionalites
based on the argv0 string (BuggieBox, for example, does it to determine
which utility to run).
This change essentially puts the DynamicLoader with 2 roles - the first
one is to be invoked by the kernel to dynamically link an ELF executable
in runtime.
The second role is to allow running ELF executables explicitly from
userspace so the kernel runs the DynamicLoader as the "intended" program
but now the DynamicLoader can do its own commandline argument parsing
and run a specified binary, with future options being possible to
implement easily.
This will be used in the DynamicLoader code, as it can't do syscalls via
LibCore code.
Because we can't use most of the LibCore code, we convert the versioning
code in Version.cpp to use LibC uname() function.
Instead of SetVariable having 2x2 modes for variable/lexical and
initialize/set, those 4 modes are now separate instructions, which
makes each instruction much less branchy.
The last completion value in a function is not exposed to the language,
since functions always either return something, or undefined.
Given this, we can avoid emitting code that propagates the completion
value from various statements, as long as we know we're generating code
for a context where the completion value is not accessible. In practical
terms, this means that function code gets to do less completion
shuffling, while global and eval code has to keep doing it.
We add a prctl option which would be called once after the dynamic
loader has finished to do text relocations before calling the actual
program entry point.
This change makes it much more obvious when we are allowed to change
a region protection access from being writable to executable.
The dynamic loader should be able to do this, but after a certain point
it is obvious that such mechanism should be disabled.
This flag is set only once, and should never reset once it has been set,
making it an ideal SetOnce use-case.
It also simplifies the expected conditions for the enabling prctl call,
as we don't expect a boolean flag, but rather the specific prctl option
will always set (enable) Process' AddressSpace syscall region enforcing.
Now both /bin/zcat and /bin/gunzip are symlinks to /bin/gzip, and we
essentially running it in decompression mode through these symlinks.
This ensures we don't maintain 2 versions of code to decompress Gzipped
data anymore, and handle the use case of gzipped-streaming input only
once in the codebase.
For example, for 7z7c.gif, we now store one 500x500 frame and then
a 94x78 frame at (196, 208) and a 91x78 frame at (198, 208).
This reduces how much data we have to store.
We currently store all pixels in the rect with changed pixels.
We could in the future store pixels that are equal in that rect
as transparent pixels. When inputs are gif files, this would
guaranteee that new frames only have at most 256 distinct colors
(since GIFs require that), which would help a future color indexing
transform. For now, we don't do that though.
The API I'm adding here is a bit ugly:
* WebPs can only store x/y offsets that are a multiple of 2. This
currently leaks into the AnimationWriter base class.
(Since we potentially have to make a webp frame 1 pixel wider
and higher due to this, it's possible to have a frame that has
<= 256 colors in a gif input but > 256 colors in the webp,
if we do the technique above.)
* Every client writing animations has to have logic to track
previous frames, decide which of the two functions to call, etc.
This also adds an opt-out flag to `animation`, because:
1. Some clients apparently assume the size of the last VP8L
chunk is the size of the image
(see https://github.com/discord/lilliput/issues/159).
2. Having incremental frames is good for filesize and for
playing the animation start-to-end, but it makes it hard
to extract arbitrary frames (have to extract all frames
from start to target frame) -- but this is mean tto be a
delivery codec, not an editing codec. It's also more vulnerable to
corrupted bytes in the middle of the file -- but transport
protocols are good these days.
(It'd also be an idea to write a full frame every N frames.)
For https://giphy.com/gifs/XT9HMdwmpHqqOu1f1a (an 184K gif),
output webp size goes from 21M to 11M.
For 7z7c.gif (an 11K gif), output webp size goes from 2.1M to 775K.
(The webp image data still isn't compressed at all.)
This means that SetVariable instructions will now remember which
(relative) environment contains the targeted binding, letting it bypass
the full binding resolution machinery on subsequent accesses.
Truncating the value is mathematically incorrect, this error made the
conversion to grayscale unstable. In other world, calling `to_grayscale`
on a gray value would return a different value. As an example,
`Color::from_string("#686868ff"sv).to_grayscale()` used to return
#676767ff.
Merging registers, constants and locals into single vector means:
- Better data locality
- No need to check type in Interpreter::get() and Interpreter::set()
which are very hot functions
Performance improvement is visible in almost all Octane and Kraken
tests.
Instead of copying the image data pixel-by-pixel, we can memcpy full
scanlines at a time.
This knocks a 4% item down to <1% in profiles of Another World JS.
Filling a typed array with an integer shouldn't have to go through the
generic Set for every element.
This knocks a 7% item down to <1% in profiles of Another World JS at
https://cyxx.github.io/another_js/
When comparing two numbers, we can avoid a lot of implicit type
conversion nonsense and go straight to comparison, saving time in the
most common case.
To align the cursor theme tab to how DisplaySettings themes tab works,
this change forces the theme combo box to not allow free-text. Currently
on a click it puts the text cursor in the box to allow typing anything
rather than acting as a dropdown when clicking anywhere on the field.
Fixes#24306
These were out-of-line because we had some ideas about marking
instruction streams PROT_READ only, but that seems pretty arbitrary and
there's a lot of performance to be gained by putting these inline.
Performing a lookup in the blob URL registry does not work in the case
of a web worker - as the registry is not shared between processes.
However - the URL itself passed to a worker has the blob attached to it,
which we can pull out of the URL on a fetch.
Implement this function to the spec, and use the full blown URL parser
that handles blob URLs, instead of the basic-url-parser.
Also clean up a FIXME that does not seem relevant any more.
Instead, generate bytecode to execute their AST nodes and save the
resulting operands inside the NewClass instruction.
Moving property expression evaluation to happen before NewClass
execution also moves along creation of new private environment and
its population with private members (private members should be visible
during property evaluation).
Before:
- NewClass
After:
- CreatePrivateEnvironment
- AddPrivateName
- ...
- AddPrivateName
- NewClass
- LeavePrivateEnvironment
This matches the IDL definition. Without this we experience a compile
time error when adding the WheelEvent constructor due to a mimatched
type between the enum class and UnsignedLong that the prototype is
passing through to the event init struct for the constructor.
This was an unfortunate artifact from development. I originally added
this class in the DOM namespace alongside HTMLCollection until I
realized it was defined by the HTML spec instead. To remind myself to
move it over to the HTML namespace, I left this comment. It was moved
to the correct namespace before upstreaming to the main repo, but I
forgot to remove this comment!
Once we see a frame with transparent pixels, we now toggle the
"has alpha" bit in the header.
To not require a SeekableStream opened for reading, we now pass the
unmodified original flag bit to WebPAnimationWriter.
The high-level design is that we have a static method on WebPWriter that
returns an AnimationWriter object. AnimationWriter has a virtual method
for writing individual frames. This allows streaming animations to disk,
without having to buffer up the entire animation in memory first.
The semantics of this function, add_frame(), are that data is flushed
to disk every time the function is called, so that no explicit `close()`
method is needed.
For some formats that store animation length at the start of the file,
including WebP, this means that this needs to write to a SeekableStream,
so that add_frame() can seek to the start and update the size when a
frame is written.
This design should work for GIF and APNG writing as well. We can move
AnimationWriter to a new header if we add writers for these.
Currently, `animation` can read any animated image format we can read
(apng, gif, webp) and convert it to an animated webp file.
The written animated webp file is not compressed whatsoever, so this
creates large output files at the moment.
This patch stops emitting the BlockDeclarationInstantiation instruction
when there are no locals, and no function declarations in the scope.
We were spending 20% of CPU time on https://ventrella.com/Clusters/ just
creating empty environments for no reason.
Instead of relying on native stack overflows to kick us out of circular
proxy chains, we now keep track of the recursion depth and kick
ourselves out if it exceeds 10'000.
This fixes an issue where compiler tail/sibling call optimizations would
turn infinite recursion into infinite loops, and thus never hit a stack
overflow to kick itself out.
For whatever reason, we've only seen the issue on SerenityOS with UBSAN,
but it could theoretically happen on any platform.
If the minimal amount of required bindings is known in advance, it could
be used to ensure capacity to avoid resizing the internal vector that
holds bindings.
By doing that all instructions required for instantiation are emitted
once in compilation and then reused for subsequent calls, instead of
running generic instantiation process for each call.
Fix the incorrect parsing in `Chess::Move::from_algebraic`
of moves with a capture and promotion (e.g. gxf8=Q) due to
the promoted piece type not being passed through to the
`Board::is_legal` method. This addresses a FIXME regarding
this issue in `ChessWidget.cpp`.
Instead of storing a full JS::Completion for the "throw completion"
case, we now store a simple JS::Value (wrapped in ErrorValue for the
type system).
This shrinks TCO<void> and TCO<Value> (the most common TCOs) by 8 bytes,
which has a non-trivial impact on performance.
If the callee is already a temporary register, we don't need to copy it
to *another* temporary before evaluating arguments. None of the
arguments will clobber the existing temporary anyway.
We only need to make copies of locals here, in case the locals are
modified by something like increment/decrement expressions.
Registers and constants can slip right through, without being Mov'ed
into a temporary first.
We know that `undefined` in the global scope is always the proper
undefined value. This commit takes advantage of that by simply emitting
a constant undefined value instead.
Unfortunately we can't be so sure in other scopes.
This helps some of the Cloudflare Turnstile stuff run faster, since they
are deliberately screwing with JS engines by asking us to do a bunch of
bitwise operations on e.g 65535.56
By rounding such values in bytecode generation, the interpreter can stay
on the happy path while executing, and finish quite a bit faster.
We now fuse sequences like [LessThan, JumpIf] to JumpLessThan.
This is only allowed for temporaries (i.e VM registers) with no other
references to them.
Specify the time in seconds to wait for a reply for each packet sent.
Replies received out of order will not be printed as replied but they
will be considered as replied when calculating statistics. Setting it
to 0 means infinite timeout.
In flood ping mode, the time interval between each request is set
to zero to provide a rapid display of how many packets are being
dropped. For each request a period '.' is printed, while for every
reply a backspace is printed.
Before this change, switch codegen would interleave bytecode like this:
(test for case 1)
(code for case 1)
(test for case 2)
(code for case 2)
This meant that we often had to make many large jumps while looking for
the matching case, since code for each case can be huge.
It now looks like this instead:
(test for case 1)
(test for case 2)
(code for case 1)
(code for case 2)
This way, we can just fall through the tests until we hit one that fits,
without having to make any large jumps.
This removes a layer of indirection in the bytecode where we had to make
sure all the initializer elements were laid out in sequential registers.
Array expressions no longer clobber registers permanently, and they can
be reused immediately afterwards.
This patch adds a register freelist to Bytecode::Generator and switches
all operands inside the generator to a new ScopedOperand type that is
ref-counted and automatically frees the register when nothing uses it.
This dramatically reduces the size of bytecode executable register
windows, which were often in the several thousands of registers for
large functions. Most functions now use less than 100 registers.
The first block in every executable will always execute first, so if it
ends up doing a ResolveThisBinding, it's fine for all other blocks
within the same executable to use the same `this` value.
In the common case, parseInt() is being called with strings that don't
have leading whitespace. By checking for that first, we can avoid the
cost of trimming and multiple malloc/GC allocations.
Once executed, this instruction will always produce the same result
in subsequent executions, so it's okay to cache it.
Unfortunately it may throw, so we can't just hoist it to the top of
every executable, since that would break observable execution order.
Two bugs:
1. Correctly set bits in VP8X header.
Turns out these were set in the wrong order.
2. Correctly set the `has_alpha` flag.
Also add a test for writing webp files with icc data. With the
additional checks in other commits in this PR, this test catches
the bug in WebPWriter.
Rearrange some existing functions to make it easier to write this test:
* Extract encode_bitmap() from get_roundtrip_bitmap().
encode_bitmap() allows passing extra_args that the test uses to pass
in ICC data.
* Extract expect_bitmaps_equal() from test_roundtrip()
If this turns out to be too strict in practice, we can replace it with
a `dbgln("VP8X and VP8L headers disagree about alpha; ignoring VP8X");`
instead.
ALso update catdog-alert-13-alpha-used-false.webp to not trigger this.
I had manually changed the VP8L alpha flag at offset 0x2a in
da48238fbd to clear it, but I hadn't changed the VP8X flag.
This changes the byte at offset 0x14 from 0x10 (has_alpha) to 0x00
(no alpha) as well, to match.
If this turns out to be too strict in practice, we can replace it with
a dbgln() with the same message, but with with ", ignoring header."
tacked on at the end.
The stream passed to LineProgram::create is a temporary, so rather than
keeping an (unused) dangling reference, pass this stream explicitly to
all functions used by LineProgram::create.