Commit graph

25 commits

Author SHA1 Message Date
Nico Weber 0374c1eb3b LibPDF: Handle indirect reference resolving during parsing more robustly
If `Document::resolve()` was called during parsing, it'd change the
reader's current position, so the parsing code that called it would
then end up at an unexpected position in the file.

Parser.cpp already had special-case recovery when a stream's length
was stored in an indirect reference.

Commit ead02da98a ("/JBIG2Globals") in #23503 added another case
where we could resolve indirect reference during parsing, but wasn't
aware of having to save and restore the reader position for that.

Put the save/restore code in `DocumentParser::parse_object_with_index`
instead, right before the place that ultimately changes the reader's
position during `Document::resolve`. This fixes `/JBIG2Globals` and
lets us remove the special-case code for `/Length` handling.

Since this is kind of subtle, include a test.
2024-03-19 19:20:01 -04:00
Nico Weber 9c762b9650 LibPDF+Meta: Use a CMYK ICC profile to convert CMYK to RGB
CMYK data describes which inks a printer should use to print a color.
If a screen should display a color that's supposed to look similar
to what the printer produces, it results in a color very different
to what Color::from_cmyk() produces. (It's also printer-dependent.)

There are many ICC profiles describing printing processes. It doesn't
matter too much which one we use -- most of them look somewhat
similar, and they all look dramatically better than Color::from_cmyk().

This patch adds a function to download a zip file that Adobe offers
on their web site. They even have a page for redistribution:
https://www.adobe.com/support/downloads/iccprofiles/icc_eula_win_dist.html

(That one leads to a broken download though, so this downloads the
end-user version.)

In case we have to move off this download at some point, there are also
a whole bunch of profiles at https://www.color.org/registry/index.xalter
that "may be used, embedded, exchanged, and shared without restriction".

The adobe zip contains a whole bunch of other useful and fun profiles,
so I went with it.

For now, this only unzips the USWebCoatedSWOP.icc file though, and
installs it in ${CMAKE_BINARY_DIR}/Root/res/icc/Adobe/CMYK/. In
Serenity builds, this will make it to /res/icc/Adobe/CMYK in the
disk image. And in lagom build, after #23016 this is the
lagom res staging directory that tools can install via
Core::ResourceImplementation. `pdf` and `MacPDF` already do that,
`TestPDF` now does it too.

The final piece is that LibPDF then loads the profile from there
and uses it for DeviceCMYK color conversions.

(Doing file access from the bowels of a library is a bit weird,
especially in a system that has sandboxing built in. But LibGfx does
that in FontDatabase too already, and LibPDF uses that, so it's not a
new problem.)
2024-02-01 13:42:04 -07:00
Nico Weber 9495f64f91 LibPDF: Improve hex string parsing
A local (non-public) PDF I have lying around contains this in
a page's operator stream:

```
[<00b4003e> 3 <002600480051> 3 <005700550044004f0003> -29
<00330044> 3 <0055> -3 <004e0040> 4 <0003> -29 <004c00560003> -31
<0057004b> 4 <00480003> -37 <0050
>] TJ
```

That is, there's a newline in a hexstring after a character.

This led to `Parser error at offset 5184: Unexpected character`.

The spec says in 3.2.3 String Objects, Hexadecimal Strings:
"""Each pair of hexadecimal digits defines one byte of the string.
White-space characters (such as space, tab, carriage return, line feed,
and form feed) are ignored."""

But we didn't ignore whitespace before or after a character, only
in between the bytes.

The spec also says:
"""If the final digit of a hexadecimal string is missing—that is, if
there is an odd number of digits—the final digit is assumed to be 0."""

In that case, we were skipping the closing `>` twice -- or, more
accurately, we ignored the character after it too. This has been
wrong all the way back in #6974.

Add a test that fails if either of the two changes isn't present.
2024-01-02 22:13:21 +01:00
Nico Weber 4107c2985e Tests: Add a PDF rendering test
Having some rendering test coverage is motivated by #22362, but this
test wouldn't have found the crashes over there (since colorspaces.pdf
does not contain pattern color spaces). Still, good to have some
in-repo test coverage of PDF rendering.
2023-12-20 12:45:07 +01:00
Nico Weber 13641693cb LibPDF: Use make_object<>() to make objects
No behavior change.
2023-12-20 12:19:08 +01:00
Ali Mohammad Pur 5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Nico Weber 57e2b5ef59 LibPDF+Tests: Correctly decode text strings without explicit encoding 2023-11-22 09:08:06 -07:00
Nico Weber 8ee0c75f43 LibPDF: Add (automated!) test for outline encoding
Manually added an Outlines dict with three items, one each for
every text string encoding in its title.

(Preview.app apparently can't handle UTF-8 in outlines either.)
2023-11-22 09:08:06 -07:00
Nico Weber d345c5b793 LibPDF: Add (automated!) test for info dict encoding
Manually added an info dict with the three text string encoding
methods to encoding.pdf.

(Preview.app apparently can't handle UTF-8 in info dicts!)
2023-11-22 09:08:06 -07:00
Nico Weber f4a847894f LibPDF: Make SampledFunction::evaluate() work for n-dimensional input
I didn't find example code for this and the AI assistant did very
poorly on this as well. So I had to write it all by myself!

It can be much more efficient I think, but I think the overall
shape is maybe roughly fine.
2023-11-12 07:55:04 +01:00
Nico Weber a9ef65e64a LibPDF: For multi-output SampledFunctions, fix output colors
For N outputs, the outputs aren't stored in N independent planes.
Instead, N output values are stored right next to each other in
the stream data.
2023-11-11 08:55:37 +01:00
Nico Weber ec739460e0 LibPDF: Add test for SampledFunction and fix bugs found by it
* SampledFunction now keeps the StreamObject it gets data from alive
  (doesn't matter too much in practice, but does matter in the test,
  where nothing else keeps the stream alive).

* If a sample is an integer, we would previously sample that value
  twice and then divide by zero when interpolating. Make sure to
  sample 1 unit apart.
2023-11-11 08:55:37 +01:00
Nico Weber 80eec1e16b LibPDF: Implement PostScriptCalculatorFunction
Includes a tokenizer and interpreter for the subset of PostScript
supported in PDF type 4 functions.
2023-11-09 16:06:25 +01:00
Tim Ledbetter b4296e1c9b LibPDF: Don't use unsanitized values in error messages
Previously, constructing error messages with unsanitized input could
fail because error message strings must be UTF-8.
2023-10-26 11:05:32 +02:00
Nico Weber 323d76fbb9 LibPDF: Make encrypted object streams work
There were two problems:
1. parse_compressed_object_with_index() parses indirect objects
   without going through Parser::parse_indirect_value(), so
   push_reference() / pop_reference() weren't called.
   Manually call them, both for the indirect object containing
   the object stream and for the indirect object within the
   object stream.
2. The indirect object within the object stream got decrypted
   twice: Once when the object stream data itself got decrypted,
   and then incorrectly a second time when the object data within
   the stream was read. To fix, disable encryption while parsing
   object stream data (since it's already decrypted).

The test is from http://opf-labs.org/format-corpus/pdfCabinetOfHorrors/
which according to readme.md at the same location is CC0.
2023-07-12 17:16:25 +02:00
Nico Weber 6200097bcc Tests/LibPDF: Make encrypted_with_aes test some metadata too 2023-07-12 17:16:25 +02:00
Nico Weber 2061ee2632 Tests/LibPDF: Add test for AES-encrypted PDF
I created this by typing "sup" into TextEdit.app on macOS 13.4,
hitting Cmd-P to bring up the print dialog, clicked the PDF button
at the bottom, changed Title and Author to "sup", clicked
"Security Options…", and checked "Require password to open document"
(with password "sup").

This file tests several things:
- It has a compressed stream as first object. This used to make the
  linearization dict detection logic assert.
- It uses AES as encryption key using version 4 of the encryption
  dict. This used to not be implemented.
2023-07-12 06:28:15 +02:00
Nico Weber e0887dd045 Tests/LibPDF: Use MUST() more
No behavior change.
2023-07-12 06:28:15 +02:00
Ben Wiederhake f890b70eae Tests: Prefer TRY_OR_FAIL() and MUST() over EXPECT(!.is_error())
Note that in some cases (in particular SQL::Result and PDFErrorOr),
there is no Formatter defined for the error type, hence TRY_OR_FAIL
cannot work as-is. Furthermore, this commit leaves untouched the places
where MUST could be replaced by TRY_OR_FAIL.

Inspired by:
https://github.com/SerenityOS/serenity/pull/18710#discussion_r1186892445
2023-05-14 15:39:38 -06:00
Linus Groh 6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
sin-ack 3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00
Matthew Olsson 5b316462b2 LibPDF: Add implementation of the Standard security handler
Security handlers manage encryption and decription of PDF files. The
standard security handler uses RC4/MD5 to perform its crypto (AES as
well, but that is not yet implemented).
2022-03-29 02:52:57 +02:00
Matthew Olsson 73cf8205b4 LibPDF: Propagate errors in Parser and Document 2022-03-07 10:53:57 +01:00
Simon Woertz d8013f9c3a Tests: Add test cases for #10702 and #10717
Add test cases for parsing an empty file and a truncated file.
2022-01-08 18:57:55 +01:00
Simon Woertz 07a557194c Tests: Add base structure for LibPDF unit tests
Add a unit test for each sample pdf file that currently exists in the
anon user's `~/Document/pdf` directory.
- linear.pdf
- non-linearized.pdf
- complex.pdf

Each test ensures that the pdf document is parsed and that the page
count is the expected one.
2022-01-08 18:57:55 +01:00