1
0
mirror of https://github.com/SerenityOS/serenity synced 2024-07-09 02:50:44 +00:00
Commit Graph

40 Commits

Author SHA1 Message Date
Nico Weber
0374c1eb3b LibPDF: Handle indirect reference resolving during parsing more robustly
If `Document::resolve()` was called during parsing, it'd change the
reader's current position, so the parsing code that called it would
then end up at an unexpected position in the file.

Parser.cpp already had special-case recovery when a stream's length
was stored in an indirect reference.

Commit ead02da98a ("/JBIG2Globals") in #23503 added another case
where we could resolve indirect reference during parsing, but wasn't
aware of having to save and restore the reader position for that.

Put the save/restore code in `DocumentParser::parse_object_with_index`
instead, right before the place that ultimately changes the reader's
position during `Document::resolve`. This fixes `/JBIG2Globals` and
lets us remove the special-case code for `/Length` handling.

Since this is kind of subtle, include a test.
2024-03-19 19:20:01 -04:00
Nico Weber
290c9541de Tests/LibPDF: Add a test for resolving indirect refs during parsing
This test uses a JBIG2Globals with an indirect reference,
and contains an indirect reference for a stream length.
When we parse the main JBIG2 image's stream, we unfilter
its data, which causes these two indirect references to
be resolved during parsing.

I started with the output of

    Meta/jbig2_to_pdf.py -o foo.pdf \
        Tests/LibGfx/test-inputs/jbig2/bitmap.jbig2

and then manually added a

    /DecodeParms <</JBIG2Globals 6 0 R>>

entry pointing to an empty stream, and made that new stream
object's length an indirect reference too for good measure.

I used `mutool clean` to fix up offsets a bit. But that also
removes the indirect reference for a stream's length, so I
manually put that back in and adjusted the offset to the last
object in the xref table and the startxref value.
2024-03-19 19:20:01 -04:00
Nico Weber
f5eb57f6bb Tests/LibPDF: Make standard-14-fonts.pdf 200 units less high
No need for a bunch of whitespace at the bottom.

No behavior change.
2024-02-27 17:42:08 -05:00
Nico Weber
b27aca9300 Tests/LibPDF: Add a benchmark for SampledFunction::evaluate()
Takes 235.6±2.3ms to run over here, with `--benchmark_repetitions 5`.
2024-02-13 19:45:19 +01:00
Nico Weber
9c762b9650 LibPDF+Meta: Use a CMYK ICC profile to convert CMYK to RGB
CMYK data describes which inks a printer should use to print a color.
If a screen should display a color that's supposed to look similar
to what the printer produces, it results in a color very different
to what Color::from_cmyk() produces. (It's also printer-dependent.)

There are many ICC profiles describing printing processes. It doesn't
matter too much which one we use -- most of them look somewhat
similar, and they all look dramatically better than Color::from_cmyk().

This patch adds a function to download a zip file that Adobe offers
on their web site. They even have a page for redistribution:
https://www.adobe.com/support/downloads/iccprofiles/icc_eula_win_dist.html

(That one leads to a broken download though, so this downloads the
end-user version.)

In case we have to move off this download at some point, there are also
a whole bunch of profiles at https://www.color.org/registry/index.xalter
that "may be used, embedded, exchanged, and shared without restriction".

The adobe zip contains a whole bunch of other useful and fun profiles,
so I went with it.

For now, this only unzips the USWebCoatedSWOP.icc file though, and
installs it in ${CMAKE_BINARY_DIR}/Root/res/icc/Adobe/CMYK/. In
Serenity builds, this will make it to /res/icc/Adobe/CMYK in the
disk image. And in lagom build, after #23016 this is the
lagom res staging directory that tools can install via
Core::ResourceImplementation. `pdf` and `MacPDF` already do that,
`TestPDF` now does it too.

The final piece is that LibPDF then loads the profile from there
and uses it for DeviceCMYK color conversions.

(Doing file access from the bowels of a library is a bit weird,
especially in a system that has sandboxing built in. But LibGfx does
that in FontDatabase too already, and LibPDF uses that, so it's not a
new problem.)
2024-02-01 13:42:04 -07:00
Nico Weber
09a91e54c0 Tests: Add a pdf with text rotation
As usual, lovingly hand-written, with offsets fixed up with

    mutool clean Tests/LibPDF/rotate.pdf  Tests/LibPDF/rotate.pdf
2024-01-18 14:01:30 +01:00
Nico Weber
8507151850 Tests/LibPDF: Make text.pdf also use Tz, Tc, Tw, Ts operators
These set the horizontal scale factor, character spacing, word
spacing, and text rise respectively.

Also add a global scale transform, and set a text transform matrix
with a scale for some of the text.
2024-01-18 14:01:30 +01:00
Nico Weber
9a93f677f4 LibPDF: Mark text rendering matrix as dirty after TJ numbers
Mostly because I audited all places that assigned to `m_text_matrix`
after #22760.

This one is very difficult to trigger in practice.

`show_text()` marks the text rendering matrix dirty already,
so this only has an effect if the `TJ` array starts with a
number, and the matrix isn't marked dirty going in.

`Tm` caches the text rendering matrix, so I changed text.pdf
to contain:

```
1 0 0 1 45 130 Tm
[ 200 (Hello) -2000 (World) ] TJ T*
```

This first sets an x offset of 5 (on top of the normal 40), and
then undoes it (`200` is multiplied by font size (25) / -1000,
and `200 * 25 / -1000` is -5). Before this change, the topmost
"Hello World" ended up slightly indented.

Likely no behavior change in practice, but makes the code easier
to understand, and maybe it helps in the wild somewhere.
2024-01-15 08:39:04 +00:00
Nico Weber
b47695c87e Tests: Add a pdf that embeds a wide-gamut 16-bits/channel image
I opened Tests/LibGfx/test-inputs/png/wide-gamut-only.png in
Preview.app and used File->Export as PDF... to convert it to a PDF.

I then ran

    mutool clean -d Tests/LibPDF/wide-gamut-only.pdf \
                    Tests/LibPDF/wide-gamut-only.pdf

to decompress it, edited by hand to remove padding around the image
and shrunk the page's MediaBox to be as big as the image, and ran the
command above again to fix up binary offsets in the xref table.
2024-01-12 16:20:46 -07:00
Nico Weber
4380be9d01 Tests/LibPDF: Add a PDF using the standard 14 fonts
Hand-written (with offsets fixed up by `mutool clean`).
Uses the default encoding for each font.  Manual test for now.

Byte strings generated with:

    python3 -c "for i in range(4):
        print('<' +
              ''.join('%02x' % r for r in range(i * 64, (i + 1) * 64)) +
              '>')"
2024-01-03 10:19:24 +01:00
Nico Weber
9495f64f91 LibPDF: Improve hex string parsing
A local (non-public) PDF I have lying around contains this in
a page's operator stream:

```
[<00b4003e> 3 <002600480051> 3 <005700550044004f0003> -29
<00330044> 3 <0055> -3 <004e0040> 4 <0003> -29 <004c00560003> -31
<0057004b> 4 <00480003> -37 <0050
>] TJ
```

That is, there's a newline in a hexstring after a character.

This led to `Parser error at offset 5184: Unexpected character`.

The spec says in 3.2.3 String Objects, Hexadecimal Strings:
"""Each pair of hexadecimal digits defines one byte of the string.
White-space characters (such as space, tab, carriage return, line feed,
and form feed) are ignored."""

But we didn't ignore whitespace before or after a character, only
in between the bytes.

The spec also says:
"""If the final digit of a hexadecimal string is missing—that is, if
there is an odd number of digits—the final digit is assumed to be 0."""

In that case, we were skipping the closing `>` twice -- or, more
accurately, we ignored the character after it too. This has been
wrong all the way back in #6974.

Add a test that fails if either of the two changes isn't present.
2024-01-02 22:13:21 +01:00
Nico Weber
4107c2985e Tests: Add a PDF rendering test
Having some rendering test coverage is motivated by #22362, but this
test wouldn't have found the crashes over there (since colorspaces.pdf
does not contain pattern color spaces). Still, good to have some
in-repo test coverage of PDF rendering.
2023-12-20 12:45:07 +01:00
Nico Weber
13641693cb LibPDF: Use make_object<>() to make objects
No behavior change.
2023-12-20 12:19:08 +01:00
Ali Mohammad Pur
5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Kyle Pereira
60c7ff9db1 Tests: Add a test for LibPDF pattern rendering
This is based largely on Adobe's EXAMPLE 2 on p176 of the specification,
manually edited slightly and then cleaned up with mutool.

https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf
2023-12-10 16:44:24 +01:00
Nico Weber
57e2b5ef59 LibPDF+Tests: Correctly decode text strings without explicit encoding 2023-11-22 09:08:06 -07:00
Nico Weber
8ee0c75f43 LibPDF: Add (automated!) test for outline encoding
Manually added an Outlines dict with three items, one each for
every text string encoding in its title.

(Preview.app apparently can't handle UTF-8 in outlines either.)
2023-11-22 09:08:06 -07:00
Nico Weber
d345c5b793 LibPDF: Add (automated!) test for info dict encoding
Manually added an info dict with the three text string encoding
methods to encoding.pdf.

(Preview.app apparently can't handle UTF-8 in info dicts!)
2023-11-22 09:08:06 -07:00
Nico Weber
65b895595a LibPDF: Add an encoding test file
For now, this uses UTF-16BE and UTF-8 marked strings in page body text.
These markings should be ignored in body text.

Hand-written, with `set fenc=latin1` and `set binary` in vim, and
xref etc fixed up by running

    mutool clean Tests/LibPDF/encoding.pdf  Tests/LibPDF/encoding.pdf

as usual.
2023-11-22 09:08:06 -07:00
Nico Weber
4c6afd4763 Tests: Install recently added PDF test files
These aren't needed for any automated tests, but it's still nice to have
them in the OS for manual testing.
2023-11-22 09:08:06 -07:00
Nico Weber
eedd978974 Tests: Add a type3 test pdf
A manual test, but better than nothing.

I hand-wrote the file, and used mutool to fix up xref and stream
lengths:

    mutool clean Tests/LibPDF/type3.pdf Tests/LibPDF/type3.pdf

The file contains one `d1` character which per spec shouldn't
contain color statements, and if it does it should be ignored,
and one `d0` character which can contain color.

The text then sets a color before rendering the text.

Per spec, the text color should affect the `d1` character but
not the `d0` one. We get this wrong, but so does Preview.app.
(PDFium gets it right.)

But independent of the colors, just rendering the glyphs at all
at the right position is already good :^)
2023-11-19 22:33:34 +01:00
Nico Weber
4f51ff456e LibPDF: Add a test for inter-word spacing with TJ
Hand-written, based on the text example in Appendix G.2 in
the PDF 1.7 spec, with the xref table fixed up by `mutool clean`:

    mutool clean -dggg Tests/LibPDF/text.pdf Tests/LibPDF/text.pdf
2023-11-14 10:11:09 +01:00
Nico Weber
f4a847894f LibPDF: Make SampledFunction::evaluate() work for n-dimensional input
I didn't find example code for this and the AI assistant did very
poorly on this as well. So I had to write it all by myself!

It can be much more efficient I think, but I think the overall
shape is maybe roughly fine.
2023-11-12 07:55:04 +01:00
Nico Weber
a9ef65e64a LibPDF: For multi-output SampledFunctions, fix output colors
For N outputs, the outputs aren't stored in N independent planes.
Instead, N output values are stored right next to each other in
the stream data.
2023-11-11 08:55:37 +01:00
Nico Weber
ec739460e0 LibPDF: Add test for SampledFunction and fix bugs found by it
* SampledFunction now keeps the StreamObject it gets data from alive
  (doesn't matter too much in practice, but does matter in the test,
  where nothing else keeps the stream alive).

* If a sample is an integer, we would previously sample that value
  twice and then divide by zero when interpolating. Make sure to
  sample 1 unit apart.
2023-11-11 08:55:37 +01:00
Nico Weber
80eec1e16b LibPDF: Implement PostScriptCalculatorFunction
Includes a tokenizer and interpreter for the subset of PostScript
supported in PDF type 4 functions.
2023-11-09 16:06:25 +01:00
Nico Weber
00f1a6cf86 Tests: Add a pdf file for testing color spaces
Covers DeviceGray, CalRGB, DeviceRGB, DeviceCMYK, Lab, CalGray for now.

Does not yet cover Indexed, Pattern, Separation, DeviceN, ICCBased.

Lovingly hand-written, with the xref table fixed up by mutool.
2023-11-04 17:02:37 -04:00
Tim Ledbetter
b4296e1c9b LibPDF: Don't use unsanitized values in error messages
Previously, constructing error messages with unsanitized input could
fail because error message strings must be UTF-8.
2023-10-26 11:05:32 +02:00
Nico Weber
323d76fbb9 LibPDF: Make encrypted object streams work
There were two problems:
1. parse_compressed_object_with_index() parses indirect objects
   without going through Parser::parse_indirect_value(), so
   push_reference() / pop_reference() weren't called.
   Manually call them, both for the indirect object containing
   the object stream and for the indirect object within the
   object stream.
2. The indirect object within the object stream got decrypted
   twice: Once when the object stream data itself got decrypted,
   and then incorrectly a second time when the object data within
   the stream was read. To fix, disable encryption while parsing
   object stream data (since it's already decrypted).

The test is from http://opf-labs.org/format-corpus/pdfCabinetOfHorrors/
which according to readme.md at the same location is CC0.
2023-07-12 17:16:25 +02:00
Nico Weber
6200097bcc Tests/LibPDF: Make encrypted_with_aes test some metadata too 2023-07-12 17:16:25 +02:00
Nico Weber
2061ee2632 Tests/LibPDF: Add test for AES-encrypted PDF
I created this by typing "sup" into TextEdit.app on macOS 13.4,
hitting Cmd-P to bring up the print dialog, clicked the PDF button
at the bottom, changed Title and Author to "sup", clicked
"Security Options…", and checked "Require password to open document"
(with password "sup").

This file tests several things:
- It has a compressed stream as first object. This used to make the
  linearization dict detection logic assert.
- It uses AES as encryption key using version 4 of the encryption
  dict. This used to not be implemented.
2023-07-12 06:28:15 +02:00
Nico Weber
e0887dd045 Tests/LibPDF: Use MUST() more
No behavior change.
2023-07-12 06:28:15 +02:00
Ben Wiederhake
f890b70eae Tests: Prefer TRY_OR_FAIL() and MUST() over EXPECT(!.is_error())
Note that in some cases (in particular SQL::Result and PDFErrorOr),
there is no Formatter defined for the error type, hence TRY_OR_FAIL
cannot work as-is. Furthermore, this commit leaves untouched the places
where MUST could be replaced by TRY_OR_FAIL.

Inspired by:
https://github.com/SerenityOS/serenity/pull/18710#discussion_r1186892445
2023-05-14 15:39:38 -06:00
Sam Atkins
1910dc8976 Tests: Move test PDF files into Tests/LibPDF
Let's put test files with the tests themselves, instead of a random user
directory. (But still copy them so they appear in the user directory
for convenience.)
2023-01-19 11:50:10 +00:00
Linus Groh
6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
sin-ack
3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00
Matthew Olsson
5b316462b2 LibPDF: Add implementation of the Standard security handler
Security handlers manage encryption and decription of PDF files. The
standard security handler uses RC4/MD5 to perform its crypto (AES as
well, but that is not yet implemented).
2022-03-29 02:52:57 +02:00
Matthew Olsson
73cf8205b4 LibPDF: Propagate errors in Parser and Document 2022-03-07 10:53:57 +01:00
Simon Woertz
d8013f9c3a Tests: Add test cases for #10702 and #10717
Add test cases for parsing an empty file and a truncated file.
2022-01-08 18:57:55 +01:00
Simon Woertz
07a557194c Tests: Add base structure for LibPDF unit tests
Add a unit test for each sample pdf file that currently exists in the
anon user's `~/Document/pdf` directory.
- linear.pdf
- non-linearized.pdf
- complex.pdf

Each test ensures that the pdf document is parsed and that the page
count is the expected one.
2022-01-08 18:57:55 +01:00