If an image has 256 or fewer colors, WebP/Lossless allows storing
the colors in a helper image, and then storing just indexes into that
helper image in the main image's green channel, while setting
r, b, and a of the main image to 0.
Since constant-color channels need to space to store in WebP,
this reduces storage needed to 1/4th (if alpha is used) or 1/3rd
(if alpha is constant across the image).
If an image has <= 16 colors, WebP lossless files pack multiple
color table indexes into a single pixel's green channel, further
reducing file size. This pixel packing is not yet implemented in
this commit.
GIFs can store at most 256 colors per frame, so animated gifs
often have 256 or fewer colors, making this effective when
transcoding gifs.
(WebP also has a "subtract green" transform, which can be used
to need to store just a single channel for grayscale images, without
having to store a color table. That's not yet implemented -- for now,
we'll now store grayscale images using this color indexing transform
instead, which wastes to storage for the color table.)
(If an image has <= 256 colors but all these colors use only a single
channel, then storing a color table for these colors is also wasteful,
at least if the image has > 16 colors too. That's rare in practice,
but maybe we can add code for it later on.)
(WebP also has a "color cache" feature where the last few used colors
can be referenced using very few bits. This is what the webp spec says
is similar to palettes as well. We don't implement color cache writing
support yet either; maybe it's better than using a color indexing
transform for some inputs.)
Some numbers on my test files:
sunset-retro.png: No performance or binary size impact. The input
quickly uses more than 256 colors.
giphy.gif (184k): 4.1M -> 3.9M, 95.5 ms ± 4.9 ms -> 106.4 ms ± 5.3 ms
Most frames use more than 256 colors, but just barely. So fairly
expensive runtime wise, with just a small win.
(See comment on #24454 for the previous 4.9 MiB -> 4.1 MiB drop.)
7z7c.gif (11K): 118K -> 40K
Every frame has less than 256 colors (but more than 16, so no packing),
and so we can cut filesize roughly to 1/3rd: We only need to store an
index per channel. From 10.7x as large as the input to 3.6x as large.
No behavior change, but this makes it easy to correctly set this
flag when adding an indexing transform: Opacity then needs to be
determined based on if colors in the color table have opacity,
not if the indexes into the color table do.
With this struct, only the first time something sets opacity is
honored, giving us those semantics.
In an early version of the huffman writing code, we always used 8 bits
here, and the comments still reflected that. Since we're now always
writing only as many bits as we need (in practice, still almost always
8), the comments are misleading.
This allows the browser to send a query to the WebContent process,
which will search the page for the given string and highlight any
occurrences of that string.
This works for now, but is technically still not spec compliant. Right
now, we're (potentially) missing one bit when reading function indices.
See the relevant issue: #24462.
There was an issue with Clang that causes `consteval` function calls
from default initializers of fields to be made at run-time. This
manifested itself in the case of `ByteString::formatted` as an undefined
reference to `check_format_parameter_consistency` once format string
checking was enabled for Clang builds. The workaround is simple (just
move it to the member initializer list), and unblocks a useful change.
This adds a `--experimental-cpu-transforms` option to Ladybird and
WebContent (which defaults to false/off).
When enabled the AffineCommandExecutorCPU will be used to handle
painting transformed stacking contexts (i.e. stacking contexts where
the transform is something other than a simple translation). The regular
command executor will still handle the non-transformed cases.
This is hidden under a flag as the `AffineCommandExecutorCPU` is very
incomplete now. It missing support for clipping, text, and other basic
commands. Once most common commands have been implemented this flag
will be removed.
This CommandExecutor is intended to provide better support for painting
stacking contexts where the transform is not a simple translation. It
is not intended to replace the CPU command executor (as its methods of
painting will likely be slower for the non-transformed case), instead,
it will function as a companion executor to handle transformations.
This is only intended to properly handle 2D transformations (skews,
rotations, scaling, etc). Full support for 3D transformations would
need further changes in LibGfx.
As it stands this is (very) incomplete and experimental, but hopefully,
this can be fleshed out to the point where it supports most common
painting commands.
This adds two new CommandResults: ContinueWithNestedExecutor and
ContinueWithParentExecutor.
ContinueWithNestedExecutor switches the command executor to the result
of calling `.nested_executor()` on the current CommandExecutor.
ContinueWithParentExecutor returns to the previous command executor
(i.e. what it was before the last ContinueWithNestedExecutor).
Using `draw_scaled_bitmap()` for simple scales is almost always better
than using `draw_scaled_bitmap_with_transform()` which does not respect
the scaling mode.
`Painting::paint_all_borders()` only uses `.draw_line()` for simple
borders and `.fill_path()` for more complex cases. These are both
already supported by the `RecordingPainter` so removing this command
simplifies the painting API.
Two test changes:
css-background-clip-text: Borders are now drawn via the AA painter
(which makes them closer to how they appear in other browsers).
corner-clip-inside-scrollable: Borders removed (does not change test)
due to imperceptible sub-pixel changes.
The check from 0805319d1e is too strict for animated images: It's
possible that an animation has alpha, but a frame doesn't.
(For example, if the animation is 10x10 pixels, but one frame updates
just 1x1 pixels. But even a full-sized frame could just not have alpha.)
If the VP8X header claims that the animation has alpha, we could check
that at least one frame has alpha, but for now we just don't do any
checking for animated images.
...instead of scheduling repaint timer in PageClient.
This change fixes flickering on Discord that happened because:
- Event loop schedules repainting by activating repaint timer
- `Document::tear_down_layout_tree()` destroys paintable tree
- Repaint timer invokes callback and renders an empty frame because
paintable tree was destroyed
Although refreshing is cheap, it was performed before each hit-testing
and was 2-4% in profiles on Discord and Twitter.
Now clip and scroll states are refreshed only if scroll offset has
changed.
Previously, we always cast to a HTMLInputElement when getting the value
of an auto directionality form associated element. This caused
undefined behavior when determining the directionality of an element
that wasn't a HTMLInputElement.
Previously, we assumed that all label control paintables were of type
`LabelablePaintable`. This caused a crash when clicking on a label with
a text input control.
As with all other current audio nodes we still need to wire up the
inputs and outputs so it can be properly used in an audio context - but
this is enough to implement the public IDL interface.
...instead of allocating separate BorderRadiusCornerClipper for each
executed sample/blit commands pair.
With this change a vector of BorderRadiusCornerClipper has far fewer
items. For example on twitter profile page its size goes down from
~3000 to ~3 items.
Before, this check was needed to prevent crashing when attempting to
allocate zero-size bitmap for sampled corners, which could have happened
if a corner had 0 radius in one axis.
Now, since SampleUnderCorners command is not emmited when radius is 0
in one axis, this check is no longer needed.
This reverts commit 6b7b9ca1c4.
The whole corner radius is invisible if it has 0 radius in any axis, so
the reverted commit was a mistake that led to error checking during
painting command execution b61aab66d9 to
avoid crashing on attempt to allocate 0 size bitmap.
Deflate and WebP can store at most 15 bits per symbol, meaning their
huffman trees can be at most 15 levels deep.
During construction, when we hit this level, we used to try again
with an ever lower frequency cap per symbol. This had the effect
of giving the symbols with the highest frequency lower frequencies
first, causing the most-frequent symbols to be merged. For example,
maybe the most-frequent symbol had 1 bit, and the 2nd-frequent
two bits (and everything else at least 3). With the cap, the two
most frequent symbols might both have 2 symbols, freeing up bits
for the lower levels of the tree.
This has the effect of making the most-frequent symbols longer at
first, which isn't great for file size.
Instead of using a frequency cap, ignore ever more of the low
bits of the frequency. This sacrifices resolution where it hurts
the lower levels of the tree first, and those are stored less
frequently.
For deflate, the 64 kiB block size means this doesn't have a big
effect, but for WebP it can have a big effect:
sunset-retro.png (876K): 2.02M -> 1.73M -- now (very slightly) smaller
than twice the input size! Maybe we'll be competitive one day.
(For wow.webp and 7z7c.webp, it has no effect, since we don't hit
the "tree too deep" case there, since those have relatively few
colors.)
No behavior change other than smaller file size. (No performance
cost either, and it's less code too.)
For our deflate, block size is limited to less than 64 kiB, so the sum
of all byte frequencies always fits in a u16 by construction.
But while I haven't hit this in practice, but it can conceivably happen
when writing WebP files, which currently use a single huffman tree
(per channel) for a while image -- which is often much larger than
64 kiB.
No dramatic behavior change in practice, just feels more correct.
This implements some of basic webp compression: Huffman coding.
(The other parts of the basics are backreferences, and color cache
entries; and after that there are the four transforms -- predictor,
subtract green, color indexing, color.)
How much huffman coding helps depends on the input's entropy.
Constant-color channels are now encoded in constant space, but
otherwise a huffman code always needs at least one bit per symbol.
This means just huffman coding can at the very best reduce output
size to 1/8th of input size.
For three test input files:
sunset-retro.png (876K): 2.25M -> 2.02M
(helps fairly little; from 2.6x as big as the png input to 2.36x)
giphy.gif (184k): 11M -> 4.9M
(pretty decent, from 61x as big as the gif input to 27x as big)
7z7c.gif (11K): 775K -> 118K
(almost as good as possible an improvement for just huffman coding,
from 70x as big as the gif input to 10.7x as big)
No measurable encoding perf impact for encoding.
The code is pretty similar to Deflate.cpp in LibCompress, with just
enough differences that sharing code doesn't look like it's worth
it to me. I left comments outlining similarities.
To be used in WebPWriter.
JPEGWriter currently hardcodes huffman tables; maybe it can use this
to build data-dependent huffman tables in the future as well.
Pure code move (except for removing the `DeflateCompressor::` prefix
on the function's name, and putting the default argument for the 4th
argument in the function's definition), no behavior change.
EventSource allows opening a persistent HTTP connection to a server over
which events are continuously streamed.
Unfortunately, our test infrastructure does not allow for automating any
tests of this feature yet. It only works with HTTP connections.
Ran into a crash here while testing LibProtocol changes. The method we
invoke here (did_progress) already accepts an Optional, and handles when
that Optional is empty. So there's no need to assume `total_size` is
non-empty.
Supporting unbuffered fetches is actually part of the fetch spec in its
HTTP-network-fetch algorithm. We had previously implemented this method
in a very ad-hoc manner as a simple wrapper around ResourceLoader. This
is still the case, but we now implement a good amount of these steps
according to spec, using ResourceLoader's unbuffered API. The response
data is forwarded through to the fetch response using streams.
This will eventually let us remove the use of ResourceLoader's buffered
API, as all responses should just be streamed this way. The streams spec
then supplies ways to wait for completion, thus allowing fully buffered
responses. However, we have more work to do to make the other parts of
our fetch implementation (namely, Body::fully_read) use streams before
we can do this.
This adds an alternate API to ResourceLoader to load HTTP/HTTPS/Gemini
requests unbuffered. Most of the changes here are moving parts of the
existing ResourceLoader::load method to helper methods so they can be
re-used by the new ResourceLoader::load_unbuffered.
LibWeb will need to use unbuffered requests to support server-sent
events. Connection for such events remain open and the remote end sends
data as HTTP bodies at its leisure. The browser needs to be able to
handle this data as it arrives, as the request essentially never
finishes.
To support this, this make Protocol::Request operate in one of two
modes: buffered or unbuffered. The existing mechanism for setting up a
buffered request was a bit awkward; you had to set specific callbacks,
but be sure not to set some others, and then set a flag. The new
mechanism is to set the mode and the callbacks that the mode needs in
one API.
This is to avoid including any LibProtocol header in Objective-C source
files, which will cause a conflict between the Protocol namespace and a
@Protocol interface.
See Ladybird/AppKit/Application/ApplicationBridge.cpp for why this
conflict unfortunately cannot be worked around.
All painting commands except SetClipRect are shifted by scroll offset
before command list execution. This change removes scroll offset
translation for sample/blit corner commands in
`PaintableWithLines::paint` so it is only applied once in
`CommandList::apply_scroll_offsets()`.
Currently, these only work when there are no CSS transforms (as the
stacking context painting is not set up to handle that case yet). This
is still enough to get most chat/comment markers working on GitHub
though :^)
The WebAssembly spec never relies on host system information, like
size_t. For consistency's sake, we should stick to the usage of u32's
instead of size_t's. This didn't cause issues before because
LEB128-encoded u64's are a superset of LEB128-encoded u32's.
- Less allocations
- Optimization pass that skips unnecessary sample/blit corner commands
could be more effecient by ignoring corner radius clipping for empty
painting commands
Both page target bitmap and mask bitmap are always BGRA8888, so we can
use `get_pixel<StorageFormat::BGRA8888>` and
`set_pixel<StorageFormat::BGRA8888>` to avoid branching by not
checking a bitmap format.
The if statement in the dispatch implies we are in the idle state, so of
course the active time will always be undefined. If this was cancelled
via a call to cancel(), we can save the time at that point. Otherwise,
just send 0.
I saw a null pointer dereference here on GitHub once, but don't know how
to reproduce, or how we'd get here. Nevertheless, null-checking the
navigable is reasonable so let's do it.
...that maintains a list of allocated execution contexts so malloc()
does not have to be called every time we need to get a new one.
20% improvement in Octane/typescript.js
16% improvement in Kraken/imaging-darkroom.js
This patch fixes two issues:
- Animation events that should go to the target element now do
(some were previously being dispatched on the animation itself.)
- We update the "previous phase" and "previous iteration" fields of
animation effects, so that we can actually detect phase changes.
This means we stop thinking animations always just started,
something that caused each animation to send 60 animationstart
events every second (to the wrong target!)
We enqueue a microtask for this readable stream AO, and the methods that
we call from the chunk steps manipulate promises. As such, we need to
enable callbacks for the entire microtask's temporary execution
context.
By using separate struct we can avoid updating AST node and
ECMAScriptFunctionObject constructors every time there is a need to
add or remove some additional information colllected during parsing.
As it's not uncommon for users to drop the module instance on the floor
after having grabbed the few exports they need to hold on to.
Fixes a few UAFs that show up as "invalid" accesses to
memory/tables/etc.
Adds a TextEditor widget to the chess application to display move
history. It is updated whenever Board::apply_move is called to reflect
the current board state.
Allows to skip function environment allocation for non-arrow functions
if the only reason it is needed is to hold `this` binding.
The parser is changed to do following:
- If a function is an arrow function and uses `this` then all functions
in a scope chain are marked to allocate function environment for
`this` binding.
- If a function uses `new.target` then all functions in a scope chain
are marked to allocate function environment.
`ordinary_call_bind_this()` is changed to put `this` value in execution
context when function environment allocation is skipped.
35% improvement in Octane/typescript.js
50% improvement in Octane/deltablue.js
19% improvement in Octane/raytrace.js
This reverts commit a362c37c8b.
The commit tried to add an unused function that continued our tradition
of not properly waiting for things but instead flooding event loop with
condition checks that delay firing of the event. I think this is a
fundamentally flawed approach. See also checks for
`fire_when_not_visible` in LibCore/EventLoopImplementationUnix.cpp.
This change adds the `width` and `height` properties to
`HTMLVideoElement` and `HTMLSourceElement`. These properties reflect
their respective content attribute values.
With this change, ".*make.*" function family now does error checking
earlier, which improves experience while using clangd. Note that the
change also make them instantiate classes a bit more eagerly, so in
LibVideo/PlaybackManager, we have to first define SeekingStateHandler
and only then make() it.
Co-Authored-By: stelar7 <dudedbz@gmail.com>
A connection's socket is allowed to be null if it's being recreated
after a failed connect(), this was handled correctly in
request_did_finish but not in recreate_socket_if_needed; this commit
fixes this oversight.
This allows us to skip allocating a function environment in cases where
it was previously impossible because the arguments object needed a
binding.
This change does not bring visible improvement in Kraken or Octane
benchmarks but seems useful to have anyway.
ReadLoopReadRequest::on_chunk expects an UInt8Array, so make sure we
convert the passed in ByteBuffer to an UInt8Array before passing it to
the AO writable_stream_default_writer_write.
Co-authored-by: Timothy Flynn <trflynn89@pm.me>
This callback is meant to be triggered by streams, which does not always
provide a WebIDL::DOMException. Pass a plain value instead. Of all the
users of this callback, only one actually uses the value, and already
converts the DOMException to a plain value.
This moves both Gfx::CanonicalCode::write_symbol() and
Compress::CanonicalCode::write_symbol() inline.
It also adds `__attribute__((always_inline))` on the arguments
to visit() in the latter. (ALWAYS_INLINE doesn't work on lambdas.)
Numbers with `ministat`: I ran once:
Build/lagom/bin/image -o test.bmp Base/res/wallpapers/sunset-retro.png
and then ran to bench:
~/src/hack/bench.py -n 20 -o bench_foo1.txt \
Build/lagom/bin/image -o test.webp test.bmp
...and then `ministat bench_foo1.txt bench_foo2.txt` to compare.
The previous commit increased the time for this command by 38% compared
to the before state.
With this, it's an 8.6% regression. So still a regression, but a smaller
one.
Or, in other words, this commit reduces times by 21% compared to the
previous commit.
Numbers with hyperfine are similar -- with this on top of the previous
commit, this is a 7-11% regression, instead of an almost 50% regression.
(A local branch that changes how we compute CanonicalCodes so that we
actually compress a bit is perf-neutral since the image writing code
doesn't change.)
`hyperfine 'image -o test.webp test.bmp'`:
* Before: 23.7 ms ± 0.7 ms (116 runs)
* Previous commit: 33.2 ms ± 0.8 ms (82 runs)
* This commit: 25.5 ms ± 0.7 ms (102 runs)
`hyperfine 'animation -o wow.webp giphy.gif'`:
* Before: 85.5 ms ± 2.0 ms (34 runs)
* Previous commit: 127.7 ms ± 4.4 ms (22 runs)
* This commit: 95.3 ms ± 2.1 ms (31 runs)
`hyperfine 'animation -o wow.webp 7z7c.gif'`:
* Before: 12.6 ms ± 0.6 ms (198 runs)
* Previous commit: 16.5 ms ± 0.9 ms (153 runs)
* This commit: 13.5 ms ± 0.6 ms (186 runs)
We still construct the code length codes manually, and now we also
construct a PrefixCodeGroup manually that assigns 8 bits to all
symbols (except for fully-opaque alpha channels, and for the
unused distance codes, like before). But now we use the CanonicalCodes
from that PrefixCodeGroup for writing.
No behavior change at all, the output is bit-for-bit identical to
before. But this is a step towards actually huffman-coding symbols.
This is however a pretty big perf regression. For
`image -o test.webp test.bmp` (where test.bmp is retro-sunset.png
re-encoded as bmp), time goes from 23.7 ms to 33.2 ms.
`animation -o wow.webp giphy.gif` goes from 85.5 ms to 127.7 ms.
`animation -o wow.webp 7z7c.gif` goes from 12.6 ms to 16.5 ms.
No behavior change. No measurable performance different either.
(I tried `hyperfine 'Build/lagom/bin/image --no-output foo.webp'`
for a few input images before and after this change, and I didn't
see a difference. I also tried if moving both
Gfx::CanonicalCode::read_symbol() and
Compress::CanonicalCode::read_symbol() inline, and that didn't
help either.)
By doing this, we can remove all the special-case checks for `length`
from the generic GetById and GetByIdWithThis code paths, making every
other property lookup a bit faster.
Small progressions on most benchmarks, and some larger progressions:
- 12% on Octane/crypto.js
- 6% on Kraken/ai-astar.js
This returns the secondary target of a mouse event. For `onmouseenter`
and `onmouseover` events, this is the EventTarget the mouse exited
from. For `onmouseleave` and `onmouseout` events, this is the
EventTarget the mouse entered to.
Most of IPC::Connection is thread-safe, but the responsiveness timer is
very much not so, this commit makes sure only one thread can send stuff
through IPC to avoid having threads racing in IPC::Connection.
This is simply less work than making sure everything IPC::Connection
uses (now and later) is thread-safe.
Previously RS handled all the requests in an event loop, leading to
issues with connections being started in the middle of other connections
being started (and potentially blowing up the stack), ultimately causing
requests to be delayed because of other requests.
This commit reworks the way we handle these (specifically starting
connections) by first serialising the requests, and then performing them
in multiple threads concurrently; which yields a significant loading
performance and reliability increase.
Previously sharing a Timer/Notifier between threads (or just handing
its ownership to another thread) lead to a crash as they are
thread-specific.
This commit makes it so we can handle mutation (i.e. just deletion
or unregistering) in a thread-safe and lockfree manner.
The VERIFY assumed that we either have a finally block which we already
visited through a previous boundary or have no finally block what so
ever; But since 301a1fc763 we propagate
finalizers through unwind scopes breaking that assumption.
I saw that this not being implemented was causing a javascript exception
to be thrown when loading https://profiler.firefox.com/.
I was hoping that like the last time that partially implementing this
interface would allow the page to load further, but it seems that this
is unfortunately not the case this time!
However, this may help other pages load slightly further, and puts some
of the scaffolding in place for a proper implementation :^)
For now, this slot is always 0 - (the default value per spec). But
once we start actually processing audio streams this internal slot
should be changed correspondingly.
This will help us in detecting potential web compatability issues from
not having this implemented.
While we're at it, update the spec link, as it was moved from the DOM
parsing spec to the HTML one, and implement this function in a manner
that closr resembles spec text.
To determine the palette of colors we use the median cut algorithm.
While being a correct implementation, enhancements are obviously
existing on both the median cut algorithm and the encoding side.
This is useful to find the best matching color palette from an existing
bitmap. It can be used in PixelPaint but also in encoders of old image
formats that only support indexed colors e.g. GIF.
It wasn't safe to use addition_would_overflow(a, -b) to check if
subtraction (a - b) would overflow, since it doesn't cover this case.
I don't know why we didn't have subtraction_would_overflow(), so this
patch adds it. :^)