Commit graph

59964 commits

Author SHA1 Message Date
Kenneth Myhra 51847bbebf LibWeb: Remove ImageData's create_with_size() and use create() instead
Removes ImageData::create_with_size() and redirects previous usage to
ImageData::create().
2024-03-24 11:09:09 +01:00
Kenneth Myhra 8a1e88677f LibWeb: Add FIXME comments to ImageData.idl
Add FIXME comments for ImageData's missing constructor and attribute
colorSpace.
2024-03-24 11:09:09 +01:00
Kenneth Myhra 30a02fef91 LibWeb: Add one of the two documented constructors to ImageData
Also adds the IDL types:
- dictionary ImageDataSettings
- enum PredefinedColorSpace.
2024-03-24 11:09:09 +01:00
Ali Mohammad Pur 6adf1be06b Shell: Add support for octal escapes in strings
This adds all three common prefixes (\0, \o and \c).
2024-03-24 08:26:56 +01:00
Nico Weber ce4396d6ff MacPDF: Fix capitalization of "Show Images" Debug menu entry 2024-03-24 08:25:31 +01:00
Ali Mohammad Pur 27a38932da LibRegex: Account for extra explicit And/Or in class parser assertion
Fixes #23691.
2024-03-24 08:24:46 +01:00
Nico Weber 259a84ddac Tests/JBIG2: Add a test for symbol and text segment decoding 2024-03-23 17:30:15 -04:00
Nico Weber ced21d8419 LibGfx/JBIG2: Call decode_immediate_text_region for lossless text region
It seems to do the right thing already, and nothing in the spec says
not to do this as far as I can tell.

With this, we can finally decode the test input from #23659.

See f391c7822d for a similar change for generic regions and
lossless generic regions.
2024-03-23 17:30:15 -04:00
Nico Weber b15e1d2b2a LibGfx/JBIG2: Implement initial support for text segments
Text segments conceptually store (x,y,id) triples. (x,y) are a
coordinate, and id refers to an id from a symbol segment.
A text segment has the effect of drawing some of the bitmaps stored
in a symbol segment to the output bitmap.

For example, the symbol segment might contain a small bitmap that
happens to look like the letter 'A', and the text segment might
draw that everywhere a scanned page has an 'A'. (The JBIG2 format
only treats it as an abstract bitmap. It doesn't know that this
small bitmap is an 'A'.)

This is missing support for many things:

* Huffman-coded input (not used in practice)
* Symbol refinement
* Transposed symbols
* Colors (not used in practice)

Still, we now have basic symbol/text segment support. This is enough
to decode the downloadable PDF here:
https://www.google.com/books/edition/Paradise_Lost/6qdbAAAAQAAJ

It doesn't lead to any progression on my 1000 file test PDF set.
The 7 files in there that use JBIG2 with symbol and text segments
now fail to load for other reasons (4 need symbol refinement for
text segments, one needs end-of-stripe segment support, one needs
support for symbol segments referring to other segments).

(And possibly, many other PDFs from Google Books, but that's the
only one I've tried so far.)
2024-03-23 17:30:15 -04:00
Nico Weber 3454970903 LibGfx/JBIG2: Extract composite_bitbuffer() and add some features
This extracts the bitbuffer combining code we had into a new function
composite_bitbuffer() and adds the following features:

* Real support for combination operators (which also lets us allow black
  as background color again, even if that's never used in practice)
* Clipping support (not used here yet, but will be needed elsewhere
  soon)

We're going to need this for text segment handling.

No behavior change.
2024-03-23 17:30:15 -04:00
Nico Weber 754e1b46fc LibGfx/JBIG2: Implement basic symbol segment processing
A symbol segment defines a bunch of small bitmaps and associates them
with numeric IDs.

This only implements reading symbols encoded with the arithmetic coder.
It does not support huffman coding. (In practice, everything seems to
use arithmetic coding.)

Support for refinement or aggregate coding isn't implemented yet.
Support for retaining bitmap coding contexts isn't implemented yet.
Support for symbol segments referring to other symbol segments isn't
implemented yet.
But all produce diagnostics if encountered, so we won't forget about
them. (I haven't seen either being used in the wild.)

No visible behavior change yet, but with JBIG2_DEBUG turned on,
it produces all kinds of debug output.
2024-03-23 17:30:15 -04:00
Nico Weber 93fcb529cf LibGfx/JBIG2: Move SegmentData down a bit
Symbol segments will store decoded symbols, and for that SegmentData
needs to come after BitBuffer.

No behavior change.
2024-03-23 17:30:15 -04:00
Nico Weber 2099ca48a1 LibGfx/JBIG2: Pass in decoder and contexts to generic region decoder
The symbol segment decoding procedure will read generic regions
that aren't at a byte boundary, and that share contexts across
several regions.

No behavior change.
2024-03-23 17:30:15 -04:00
Nico Weber 376b1a2309 LibGfx/JBIG2: Have just one CombinationOperator enum class
We already had two, and we would need another one for text segments.

No behavior change.
2024-03-23 17:30:15 -04:00
Nico Weber c06110da87 LibGfx/JBIG2: Make AdaptiveTemplatePixel toplevel
We're going to need it for symbol segment decoding too.

No behavior change.
2024-03-23 17:30:15 -04:00
Nico Weber 8e82c2b932 LibGfx/JBIG2: Add arithmetic integer decoder
The existing ArithmeticEncoder (from Annex E) reads one bit at a
time.

ArithmeticIntegerDecoder (from Annex A) builds on top of that to
read integer values.

This will be used by both the symbol segment and the text segment
readers.

(This does not yet implement the IAID decoding procedure in A.3.
We only need that one in the text segment decoder at the moment,
and it's pretty small, so I'll put it inline there for now.)

Not used yet, so no behavior change yet.
2024-03-23 17:30:15 -04:00
Nico Weber c99506da7d LibGfx/JBIG2: Initialize POD members
And use Array<> instead of C-style arrays.
2024-03-23 17:30:15 -04:00
Timothy Flynn 8af140fd7b Ladybird: Use Core::Resource to locate the emoji lookup path
The path we were using is no longer correct, and we've been silently
dropping this error. Use Core::Resource instead, which we use for most
other Ladybird resources. This would have made it much more obvious that
emoji were not installed with the application.
2024-03-23 17:26:31 -04:00
Timothy Flynn 7463b31754 Ladybird: Ensure emoji files are installed into the Ladybird bundle
Otherwise, we are unable to render emoji on websites.
2024-03-23 17:26:31 -04:00
Timothy Flynn 91cd43a7ac Meta: Add a file containing a list of all emoji file names
And add a verification step to the emoji data generator to ensure all
emoji are listed in this file. This file will be used as a sources list
in both the CMake and GN build systems.

It is probably possible to generate this list. But in a first attempt,
the CMake code to set the file as a dependency of a pseudo target, which
would then parse the file and install the listed emoji was getting quite
verbose and complicated. So for now, let's just maintain this list.
2024-03-23 17:26:31 -04:00
Timothy Flynn 2f85620b43 Meta: Ensure we install resource files when those resource files change 2024-03-23 17:26:31 -04:00
Timothy Flynn a729677561 Meta: Port recent changes to the GN build
6c26ff567e
e800605ad3
cf7c933312
2024-03-23 17:26:31 -04:00
Timothy Flynn feddecde5b LibWeb: Emit the current token before EOF on invalid comments
The spec for each of these state:

    -> EOF:
    This is an eof-in-comment parse error. Emit the current comment
    token. Emit an end-of-file token.

We were neglecting to emit the current comment token before emitting an
EOF token. Note the existing EMIT_CURRENT_TOKEN macro was unused.
2024-03-23 20:58:31 +01:00
Timothy Flynn 775282f9fc LibWebView: Stop tokenizing the source HTML once we hit an EOF token 2024-03-23 20:58:31 +01:00
Aliaksandr Kalenik 26a516c85f LibWeb: Allow any FC type for replaced boxes in dimension_box_on_line()
If box is sized as replaced it still could be anything, not only SVG.

This fixes crashing on https://www.shopify.com/ that was caused by a
missing paintable for a box that has a layout node. This occurred
because the box was not laid out in dimension_box_on_line().
2024-03-23 20:57:05 +01:00
Tim Ledbetter e1fbb08747 LibWeb: Avoid division by zero when calculating box aspect ratio 2024-03-23 20:56:26 +01:00
Tim Ledbetter 2227674b91 LibWeb: Don't crash when updating a select with detached option elements
`Node::shadow_including_root()` was missing a null check, which caused
a crash when manipulating a select element, whose option elements were
initially detached.
2024-03-23 20:56:26 +01:00
Tim Ledbetter 521a1be97f LibWeb: Don't crash when querying the CDataSection.assignedSlot property 2024-03-23 20:56:26 +01:00
Tim Ledbetter 3518f39b60 LibWeb: Don't crash when querying detached circle element properties 2024-03-23 20:56:26 +01:00
Dan Klishch cab0cb5b13 JSSpecCompiler: Use AK::enumerate in CFGBuildingPass::on_entry 2024-03-23 09:02:58 -04:00
Dan Klishch 45a0ba2167 AK: Introduce AK::enumerate
Co-Authored-By: Tim Flynn <trflynn89@pm.me>
2024-03-23 09:02:58 -04:00
forchane 2d11fc6d44 LibJS: Rename ToSecondsStringPrecision to ToSecondsStringPrecisionRecord
This is an editorial change in the Temporal spec.

See: https://github.com/tc39/proposal-temporal/commit/60f1052
2024-03-23 08:46:56 -04:00
forchane d2e4da62c8 LibJS: Separate validation of roundingIncrement option
This is an editorial change in the temporal spec.

See: https://github.com/tc39/proposal-temporal/commit/712c449
2024-03-23 08:45:59 -04:00
Timothy Flynn 7b3ddd5e15 LibWeb: Track fetching-related tasks in FetchController for cancellation
The HTMLMediaElement, for example, contains spec text which states any
ongoing fetch process must be "stopped". The spec does not indicate how
to do this, so our implementation is rather ad-hoc.

Our current implementation may cause a crash in places that assume one
of the fetch algorithms that we set to null is *not* null. For example:

    if (fetch_params.process_response) {
        queue_fetch_task([]() {
            fetch_params.process_response();
        };
    }

If the fetch process is stopped after queuing the fetch task, but not
before the fetch task is run, we will crash when running this fetch
algorithm.

We now track queued fetch tasks on the fetch controller. When the fetch
process is stopped, we cancel any such pending task.

It is a little bit awkward maintaining a fetch task ID. Ideally, we
could use the underlying task ID throughout. But we do not have access
to the underlying task nor its ID when the task is running, at which
point we need some ID to remove from the pending task list.
2024-03-23 13:45:35 +01:00
Timothy Flynn 4806cf9527 LibWeb: Return the ID of queued global events
This will allow callers to track the event.
2024-03-23 13:45:35 +01:00
Nico Weber 3a50cadddf Tests/LibGfx: Add a jbig2 file using basic symbol and text segments
I created this file using `jbig2` (see below for details), but as
far as I can tell `jbig2` does not produce spec-compliant files:

1. It always writes to 0s for the run lengths that specify how
   many symbols to export at the end of a symbol segment

2. It doesn't write any referred-to segments for text segments.
   I think it's supposed to write a referred-to segment that
   mentions the symbol segment the text segment refers to (?)

I locally tweaked `jbig2` to fix these two defects (*), so the image
added in this commit is correct as best I can tell. It opens fine
using `image` and `jbig2`'s decode mode, and via
`Meta/jbig2_to_pdf.py` in Firefox and Chrome. Without my tweaks,
the image decodes fine with `jbig2`, but not with any of the other
three. The image (in a pdf) does _not_ decode in Preview.app,
either with or without my local `jbig2` tweaks.

*: See the PR adding this image for my local diff.

I created the test image file by running this shell script with
`jbig2` tweaked as described above:

    #!/bin/bash
    set -eu

    I=Build/lagom/bin/image
    S=Tests/LibGfx/test-inputs/bmp/bitmap.bmp

    $I "$S" --crop 232,70,120,250  -o mouth.bmp
    $I "$S" --crop 135,100,100,100 -o nose.bmp
    $I "$S" --crop 50,108,30,30    -o top_eye.bmp
    $I "$S" --crop 60,265,30,30    -o bottom_eye.bmp

    # I then manually converted those to 1bpp using Photoshop
    # (Image->Mode->Grayscale, then Image->Mode->Bitmap...,
    # File->Save As..., bmp) since `jbig2` gets confused by non-1bpp
    # bmp files and `image` can't write 1bpp files :/
    #
    # (I tried `convert ${in} -monochrome ${in}-1bpp.bmp` via
    # https://cancerberosgx.github.io/magic/playground/index.html
    # first, but that produced bmp files that neither Preview.app nor
    # `jbig2` could handle.)
    #
    # -HeightClass: Number of height classes
    # -WidthClass: Maximum number of symbols in one height class
    # -Simple means no refinement; the number after is the symbol's ID
    # The 3 numbers afer `-ID` are id, y, x. The `-ID` are sorted by x.
    # -RefCorner 1 means "top left".
    #
    # `jbig2` writes symbol and text segments as specified in the ini
    # file, and then only stores the bits of the input image that aren't
    # already set through symbol and text segments.

    cat << EOF > jbig2-symbol.ini
    -sym -Seg 1
    -sym -file -numClass -HeightClass 3 -WidthClass 2
    -sym -file -numSymbol 4
    -sym -file -Height 250
    -sym -file -Width 120 -Simple 0 mouth-1bpp.bmp
    -sym -file -EndOfHeightClass
    -sym -file -Height 100
    -sym -file -Width 100 -Simple 1 nose-1bpp.bmp
    -sym -file -EndOfHeightClass
    -sym -file -Height 30
    -sym -file -Width 30 -Simple 2 top_eye-1bpp.bmp
    -sym -file -Width 30 -Simple 3 bottom_eye-1bpp.bmp
    -sym -file -EndOfHeightClass
    -sym -Param -Huff_DH 0
    -sym -Param -Huff_DW 0

    -txt -Seg 2
    -txt -Param -numInst 4
        -ID 2 108 50 -ID 3 265 60 -ID 1 100 135 -ID 0 70 232
    -txt -Param -RefCorner 1
    -txt -Param -Xlocation 0
    -txt -Param -Ylocation 0
    -txt -Param -W 399
    -txt -Param -H 400
    EOF

    J=$HOME/Downloads/T-REC-T.88-201808-I\!\!SOFT-ZST-E/Software
    J=$J/JBIG2_SampleSoftware-A20180829/source/jbig2

    $J -i "${S%.bmp}" -f bmp -o symbol -F jb2 -ini jbig2-symbol.ini
2024-03-23 08:18:15 -04:00
LekKit bb5ad12e43 Ports: Update rvvm to 0.6
- New upstream stable version is available
- Networking is now fully stable and enabled by default
- SDL2 backend is now available alongside SDL1, so switch to it
- Fixed a name collision of PAGE_SIZE with Serenity headers
- Disable threaded IO on Serenity for now
- Many other changes and fixes
- See https://github.com/LekKit/RVVM/releases/tag/v0.6 for more
2024-03-23 13:00:44 +01:00
Aliaksandr Kalenik f932d5d825 LibWeb: Look for labeled control in DOM tree instead of layout tree
...because "change" event should be dispatched on control even if it
has "display: none" style.

This change fixes selection in labels dropdown on GitHub's "new issue"
page.
2024-03-23 12:46:37 +01:00
Nico Weber 730876fda9 LibGfx/JPEG: Add a comment to inverse_dct_8x8()
See here:
https://github.com/SerenityOS/serenity/issues/22739#issuecomment-1890599116

No behavior change.
2024-03-23 09:40:29 +01:00
MINAqwq e1598233e1 LibCore: Remove unnecessary or invalid write after child remove 2024-03-22 16:32:39 -04:00
Aliaksandr Kalenik 561e011e07 LibWeb+WebContent+Ladybird: Add ability to paste text from clipboard
Text can be pasted by pressing Ctrl/Cmd+V or by using button in the
context menu. For now only the Qt client is supported.
2024-03-22 15:47:33 -04:00
Aliaksandr Kalenik d5c6e45dca LibWeb: Change Element::closest() to check if any of selector matches
...instead of checking if all selectors match an element.

Fixes bug reduced from GitHub's "new issue" page.
2024-03-22 18:43:46 +01:00
Nico Weber 9bf29356a2 LibGfx/ISOBMFF: Support box header size 0 to mean "until end of data"
JPEG2000 uses this, and as far as I can tell it's also part of
ISO/IEC 14496-12.
2024-03-22 18:31:23 +01:00
Nico Weber 0d098211b7 LibRIFF+LibGfx/ISOBMFF: Make ChunkID (de)serialization self-consistent
Previously, ChunkID's from_big_endian_number() and
as_big_endian_number() weren't inverses of each other.

ChunkID::from_big_endian_number() used to take an u32 that contained
`('f' << 24) | ('t' << 16) | ('y' << 8) | 'p'`, that is
'f', 't', 'y', 'p' in memory on big-endian and 'p', 'y', 't', 'f'
on little-endian, and return a ChunkID for 'f', 't', 'y', 'p'.

ChunkID::as_big_endian_number() used to return an u32 that for a
ChunkID storing 'f', 't', 'y', 'p' was always 'f', 't', 'y', 'p'
in memory on both little-endian and big-endian, that is it stored
`('f' << 24) | ('t' << 16) | ('y' << 8) | 'p'` on big-endian and
`('p' << 24) | ('y' << 16) | ('t' << 8) | 'f'` on little-endian.

`ChunkID::from_big_endian_number(0x11223344).as_big_endian_number()`
returned 0x44332211.

This change makes the two methods self-consistent: they now take
and return a u32 that always has the first ChunkID part in the
highest bits of the u32 (`'f' << 24`), and so on. That also means
they return a u32 that in-memory looks differently on big-endian
and little-endian. Since that's normal for numbers, this also
renames the two methods to just `from_number()` and `to_number()`.

With the semantics cleared up, change the one use in ISOBMFF to read a
BigEndian for chunk headers and brand codes.  This has the effect of
tags now being printed in the right order.

Before:

```sh
% Build/lagom/bin/isobmff ~/Downloads/sample1.jp2
Unknown Box ('  Pj')
[ 4 bytes ]
('pytf') (version = 0, flags = 0x0)
- major_brand = ' 2pj'
- minor_version = 0
- compatible_brands = { ' 2pj' }
Unknown Box ('h2pj')
[ 37 bytes ]
Unknown Box ('fniu')
[ 92 bytes ]
Unknown Box (' lmx')
[ 2736 bytes ]
Unknown Box ('c2pj')
[ 667336 bytes ]
```

After:

```sh
% Build/lagom/bin/isobmff ~/Downloads/sample1.jp2
hmm 0x11223344 0x11223344
Unknown Box ('jP  ')
[ 4 bytes ]
('ftyp' ) (version = 0, flags = 0x0)
- major_brand = 'jp2 '
- minor_version = 0
- compatible_brands = { 'jp2 ' }
Unknown Box ('jp2h')
[ 37 bytes ]
Unknown Box ('uinf')
[ 92 bytes ]
Unknown Box ('xml ')
[ 2736 bytes ]
Unknown Box ('jp2c')
[ 667336 bytes ]
```
2024-03-22 18:31:15 +01:00
Nico Weber b43092db46 LibGfx/ISOBMFF: Print only one set of quotes around FourCCs
AK::Formatter<RIFF::ChunkID> (in LibRIFF/ChunkID.h) adds them already,
so don't add them here too.
2024-03-22 18:31:15 +01:00
Ali Mohammad Pur e2bab93fdd LibTLS: Avoid using new event loops when setting up connections
This was causing some racey behaviour in LibHTTP, and just generally
lead to really bad stack traces; avoid that by switching to
Core::Promise and using the existing event loop.

Possibly resolves #23524 and #23642.
2024-03-22 18:27:53 +01:00
Andreas Kling afe6abfc09 LibWeb: Use an ancestor filter to quickly reject many CSS selectors
Given a selector like `.foo .bar #baz`, we know that elements with
the class names `foo` and `bar` must be present in the ancestor chain of
the candidate element, or the selector cannot match.

By keeping track of the current ancestor chain during style computation,
and which strings are used in tag names and attribute names, we can do
a quick check before evaluating the selector itself, to see if all the
required ancestors are present.

The way this works:

1. CSS::Selector now has a cache of up to 8 strings that must be present
   in the ancestor chain of a matching element. Note that we actually
   store string *hashes*, not the strings themselves.

2. When Document performs a recursive style update, we now push and pop
   elements to the ancestor chain stack as they are entered and exited.

3. When entering/exiting an ancestor, StyleComputer collects all the
   relevant string hashes from that ancestor element and updates a
   counting bloom filter.

4. Before evaluating a selector, we first check if any of the hashes
   required by the selector are definitely missing from the ancestor
   filter. If so, it cannot be a match, and we reject it immediately.

5. Otherwise, we carry on and evaluate the selector as usual.

I originally tried doing this with a HashMap, but we ended up losing
a huge chunk of the time saved to HashMap instead. As it turns out,
a simple counting bloom filter is way better at handling this.
The cost is a flat 8KB per StyleComputer, and since it's a bloom filter,
false positives are a thing.

This is extremely efficient, and allows us to quickly reject the
majority of selectors on many huge websites.

Some example rejection rates:
- https://amazon.com: 77%
- https://github.com/SerenityOS/serenity: 61%
- https://nytimes.com: 57%
- https://store.steampowered.com: 55%
- https://en.wikipedia.org: 45%
- https://youtube.com: 32%
- https://shopify.com: 25%

This also yields a chunky 37% speedup on StyleBench. :^)
2024-03-22 18:27:32 +01:00
Aliaksandr Kalenik e232a84f0e LibWeb: Do not include box's own scroll offset in get_client_rects()
Fixes https://github.com/SerenityOS/serenity/issues/23631
2024-03-22 12:13:59 +01:00
Tim Ledbetter 7b08fd9f72 LibWeb: Simplify String to CORSSettingAttribute value conversion
There's no need to check the "anonymous" case explicitly, as
`CORSSettingAttribute::Anonymous` is the default value.
2024-03-22 11:29:57 +01:00
Tim Ledbetter aabf1a65b1 LibWeb: Align CORSSettingsAttribute values with the specification
This change makes our crossOrigin attribute getter behave the same way
as other browsers.
2024-03-22 11:29:57 +01:00