Commit graph

909 commits

Author SHA1 Message Date
Andreas Kling 5a79c69b02 LibGfx: Make ImageDecoderPlugin::frame() return ErrorOr<>
This is a first step towards better error propagation from image codecs.
2021-11-21 20:22:48 +01:00
Ben Wiederhake b06b54772e Meta+LibUnicode: Provide code point names through library 2021-11-20 00:31:55 +01:00
Timothy Flynn 93ee922027 LibUnicode: Support locales-without-script aliases for ECMA-402
As noted by ECMA-402, if a supported locale contains all of a language,
script, and region subtag, then the implementation must also support the
locale without the script subtag. The most complicated example of this
is the zh-TW locale.

The list of locales in the CLDR database does not include zh-TW or its
maximized zh-Hant-TW variant. Instead, it inlcudes the zh-Hant locale.
However, zh-Hant-TW is listed in the default-content locale list in the
cldr-core package. This defines an alias from zh-Hant-TW to zh-Hant. We
must then also support the zh-Hant-TW alias without the script subtag:
zh-TW. This transitively maps zh-TW to zh-Hant, which is a case quite
heavily tested by test262.
2021-11-19 11:45:35 +01:00
Timothy Flynn 4b535ce1c8 LibUnicode: Stop passing the cldr-core package to UnicodeNumberFormat
This is no longer needed now that this generator isn't parsing the
default-content locales.
2021-11-19 11:45:35 +01:00
Timothy Flynn a13fa15a30 LibUnicode: Generate default-content locales as aliases
Previously, we were just copying the locale data into default-content
locales (for example, copying the "en" data into "en-US"). Instead, we
can just define the default-content locales as aliases to their main
locales.
2021-11-19 11:45:35 +01:00
Timothy Flynn 9d1519e21c LibUnicode: Move GenerateUnicodeData's Alias struct to generator header
This will be used for locale aliases as well. Also rename the "property"
field in this struct to "name", as it no longer is only used for
property aliases.
2021-11-19 11:45:35 +01:00
Andreas Kling 2b866e3c9b LibGfx: Remove ImageDecoderPlugin::bitmap() in favor of frame(index)
To encourage proper support for multi-frame images throughout the
system, get rid of the single-frame convenience bitmap() API.
2021-11-18 21:11:30 +01:00
Andreas Kling 750f1d580a Fuzzers: Use ImageDecoderPlugin::frame() in image decoder fuzzers
Let's work towards getting rid of the first-frame-only bitmap() API.
2021-11-18 21:11:30 +01:00
Andreas Kling 587f9af960 AK: Make JSON parser return ErrorOr<JsonValue> (instead of Optional)
Also add slightly richer parse errors now that we can include a string
literal with returned errors.

This will allow us to use TRY() when working with JSON data.
2021-11-17 00:21:10 +01:00
Timothy Flynn cafb717486 LibUnicode: Parse and generate CLDR unit data for Intl.NumberFormat
The units data is in another CLDR package, cldr-units.
2021-11-16 23:14:09 +00:00
Timothy Flynn c24a350a18 LibUnicode: Ignore U+200F when parsing format identifiers
Noticed this while implementing multiple identifier support. We were
errantly parsing U+200F as a lone identifier in some Hebrew formats.
2021-11-16 23:14:09 +00:00
Timothy Flynn 04b8b87c17 LibJS+LibUnicode: Support multiple identifiers within format pattern
This wasn't the case for compact patterns, but unit patterns can contain
multiple (up to 2, really) identifiers that must each be recognized by
LibJS.

Each generated NumberFormat object now stores an array of identifiers
parsed. The format pattern itself is encoded with the index into this
array for that identifier, e.g. the compact format string "0K" will
become "{number}{compactIdentifier:0}".
2021-11-16 23:14:09 +00:00
Timothy Flynn 3b68370212 LibJS+LibUnicode: Rename the generated compact_identifier to identifier
This field is currently used to store the StringView into the compact
name/symbol in the format string. Units will need to store a similar
field, so rename the field to be more generic, and extract the parser
for it.
2021-11-16 23:14:09 +00:00
Timothy Flynn 1f546476d5 LibJS+LibUnicode: Fix computation of compact pattern exponents
The compact scale of each formatting rule was precomputed in commit:
be69eae651

Using the formula: compact scale = magnitude - pattern scale

This computation was off-by-one.

For example, consider the format key "10000-count-one", which maps to
"00 thousand" in en-US. What we are really after is the exponent that
best represents the string "thousand" for values greater than 10000
and less than 100000 (the next format key). We were previously doing:

    log10(10000) - "00 thousand".count("0") = 2

Which clearly isn't what we want. Instead, if we do:

    log10(10000) + 1 - "00 thousand".count("0") = 3

We get the correct exponent for each format key for each locale.

This commit also renames the generated variable from "compact_scale" to
"exponent" to match the terminology used in ECMA-402.
2021-11-16 00:56:55 +00:00
Timothy Flynn 48d5684780 LibUnicode: Parse compact identifiers and replace them with a format key
For example, in en-US, the decimal, long compact pattern for numbers
between 10,000 and 100,000 is "00 thousand". In that pattern, "thousand"
is the compact identifier, and the generated format pattern is now
"{number} {compactIdentifier}". This also generates that identifier as
its own field in the NumberFormat structure.
2021-11-16 00:56:55 +00:00
Timothy Flynn 30fbb7d9cd LibUnicode: Parse and generate scientific formatting rules 2021-11-14 17:00:35 +00:00
Timothy Flynn 3645f6a0fc LibUnicode: Fix typo in percent format parser
Just by sheer luck this had no actual effect because the decimal format
prefix has the same length as the percent format prefix.
2021-11-14 17:00:35 +00:00
Timothy Flynn 3b7f5af042 LibUnicode: Generate primary and secondary number grouping sizes
Most locales have a single grouping size (the number of integer digits
to be written before inserting a grouping separator). However some have
a primary and secondary size. We parse the primary size as the size used
for the least significant integer digits, and the secondary size for the
most significant.
2021-11-14 10:35:19 +00:00
Timothy Flynn c65dea64bd LibJS+LibUnicode: Don't remove {currency} keys in GetNumberFormatPattern
In order to implement Intl.NumberFormat.prototype.formatToParts, do not
replace {currency} keys in the format pattern before ECMA-402 tells us
to. Otherwise, the array return by formatToParts will not contain the
expected currency key.

Early replacement was done to avoid resolving the currency display more
than once, as it involves a couple of round trips to search through
LibUnicode data. So this adds a non-standard method to NumberFormat to
do this resolution and cache the result.

Another side effect of this change is that LibUnicode must replace unit
format patterns of the form "{0} {1}" during code generation. These were
previously skipped during code generation because LibJS would just
replace the keys with the currency display at runtime. But now that the
currency display injection is delayed, any {0} or {1} keys in the format
pattern will cause PartitionNumberPattern to abort.
2021-11-13 19:01:25 +00:00
Timothy Flynn a701ed52fc LibJS+LibUnicode: Fully implement currency number formatting
Currencies are a bit strange; the layout of currency data in the CLDR is
not particularly compatible with what ECMA-402 expects. For example, the
currency format in the "en" and "ar" locales for the Latin script are:

    en: "¤#,##0.00"
    ar: "¤\u00A0#,##0.00"

Note how the "ar" locale has a non-breaking space after the currency
symbol (¤), but "en" does not. This does not mean that this space will
appear in the "ar"-formatted string, nor does it mean that a space won't
appear in the "en"-formatted string. This is a runtime decision based on
the currency display chosen by the user ("$" vs. "USD" vs. "US dollar")
and other rules in the Unicode TR-35 spec.

ECMA-402 shies away from the nuances here with "implementation-defined"
steps. LibUnicode will store the data parsed from the CLDR however it is
presented; making decisions about spacing, etc. will occur at runtime
based on user input.
2021-11-13 11:52:45 +00:00
Timothy Flynn e9493a2cd5 LibUnicode: Ensure UnicodeNumberFormat is aware of default content
For example, there isn't a unique set of data for the en-US locale;
rather, it defaults to the data for the en locale. See this commit for
much more detail: 357c97dfa8
2021-11-13 11:52:45 +00:00
Timothy Flynn 9421d5c0cf LibUnicode: Generate currency unit-pattern number formats
These are used when formatting a number as currency with a display
option of "name" (e.g. for USD, the name is "US Dollars" in en-US).

These patterns appear in the CLDR in a different manner than other
number formats that are pluralized. They are of the form "{0} {1}",
therefore do not undergo subpattern replacements.
2021-11-13 11:52:45 +00:00
Timothy Flynn 39e031c4dd LibJS+LibUnicode: Generate all styles of currency localizations
Currently, LibUnicode is only parsing and generating the "long" style of
currency display names. However, the CLDR contains "short" and "narrow"
forms as well that need to be handled. Parse these, and update LibJS to
actually respect the "style" option provided by the user for displaying
currencies with Intl.DisplayNames.

Note: There are some discrepencies between the engines on how style is
handled. In particular, running:

new Intl.DisplayNames('en', {type:'currency', style:'narrow'}).of('usd')

Gives:

  SpiderMoney: "USD"
  V8: "US Dollar"
  LibJS: "$"

And running:

new Intl.DisplayNames('en', {type:'currency', style:'short'}).of('usd')

Gives:

  SpiderMonkey: "$"
  V8: "US Dollar"
  LibJS: "$"

My best guess is V8 isn't handling style, and just returning the long
form (which is what LibJS did before this commit). And SpiderMoney can
handle some styles, but if they don't have a value for the requested
style, they fall back to the canonicalized code passed into of().
2021-11-13 11:52:45 +00:00
Timothy Flynn 6cfd63e5bd LibUnicode: Parse numbers in number formats a bit more leniently
The parser was previously expecting number sections within a pattern to
start with "#", but they may also begin with "0".
2021-11-13 11:52:45 +00:00
Daniel Bertalan fe1726521a Meta: Resolve cyclic dependency between LibPthread and libc++
libc++ uses a Pthread condition variable in one of its initialization
functions. This means that Pthread forwarding has to be set up in LibC
before libc++ can be initialized. Also, because LibPthread is written in
C++, (at least some) parts of the C++ standard library have to be linked
against it.

This is a circular dependency, which means that the order in which these
two libraries' initialization functions are called is undefined. In some
cases, libc++ will come first, which will then trigger an assert due to
the missing Pthread forwarding.

This issue isn't necessarily unique to LibPthread, as all libraries that
libc++ depends on exhibit the same circular dependency issue.

The reason why this issue didn't affect the GNU toolchain is that
libstdc++ is always linked statically. If we were to change that, I
believe that we would run into the same issue.
2021-11-13 11:15:33 +00:00
Andreas Kling b189c88ec2 Fuzzers: Use ImageDecoders instead of load_FORMAT_from_memory() wrappers 2021-11-13 00:55:07 +01:00
Timothy Flynn 1f2ac0ab41 LibUnicode: Move number formatting code generator to UnicodeNumberFormat 2021-11-12 20:46:38 +00:00
Timothy Flynn 04e6b43f05 LibUnicode: Move (soon-to-be) common code out of GenerateUnicodeLocale
The data used for number formatting is going to grow quite a bit when
the cldr-units package is parsed. To prevent the generated UnicodeLocale
file from growing outrageously large, the number formatting data can go
into its own file. To prepare for this, move code that will be common
between the generators for UnicodeLocale and UnicodeNumberFormat to the
utility header.
2021-11-12 20:46:38 +00:00
Ali Mohammad Pur c08bfd450b Meta: Update the gdb script for the new RefPtr layout 2021-11-12 13:01:59 +00:00
Timothy Flynn be69eae651 LibUnicode: Precompute the compact scale of each number formatting rule
This will be needed for the ComputeExponentForMagnitude AO for compact
formatting, namely step 5b:

  Let exponent be an implementation- and locale-dependent (ILD) integer
  by which to scale a number of the given magnitude in compact notation
  for the current locale.
2021-11-12 09:17:08 +00:00
Timothy Flynn 230b133ee3 LibUnicode: Parse number formats into zero/positive/negative patterns
A number formatting pattern in the CLDR contains one or two entries,
delimited by a semi-colon. Previously, LibUnicode was just storing the
entire pattern as one string. This changes the generator to split the
pattern on that delimiter and generate the 3 unique patterns expected by
ECMA-402.

The rules for generating the 3 patterns are as follows:

* If the pattern contains 1 entry, it is the zero pattern. The positive
  pattern is the zero pattern prepended with {plusSign}. The negative
  pattern is the zero pattern prepended with {minusSign}.

* If the pattern contains 2 entries, the first is the zero pattern, and
  the second is the negative pattern. The positive pattern is the zero
  pattern prepended with {plusSign}.
2021-11-12 09:17:08 +00:00
Timothy Flynn 1244ebcd4f LibUnicode: Parse and generate standard accounting formatting rules
Also known as "currency-accounting" in some CLDR documentation.
2021-11-12 09:17:08 +00:00
Timothy Flynn 967afc1b84 LibUnicode: Parse and generate standard currency formatting rules 2021-11-12 09:17:08 +00:00
Timothy Flynn bffd73e0d4 LibUnicode: Parse and generate standard decimal formatting rules 2021-11-12 09:17:08 +00:00
Timothy Flynn feb8c22a62 LibUnicode: Parse and generate standard percentage formatting rules 2021-11-12 09:17:08 +00:00
Timothy Flynn 4317a1b552 LibUnicode: Parse and generate compact currency formatting rules 2021-11-12 09:17:08 +00:00
Timothy Flynn 604a596c90 LibUnicode: Parse and generate compact decimal formatting rules 2021-11-12 09:17:08 +00:00
Timothy Flynn 12b468a588 LibUnicode: Begin parsing and generating locale number systems
The number system data in the CLDR contains information on how to format
numbers in a locale-dependent manner. Start parsing this data, beginning
with numeric symbol strings. For example the symbol NaN maps to "NaN" in
the en-US locale, and "非數值" in the zh-Hant locale.
2021-11-12 09:17:08 +00:00
Timothy Flynn d3e83c9934 LibUnicode: Parse alternate default numbering systems
Some locales in the CLDR have alternate default numbering systems listed
under "defaultNumberingSystem-alt-*", e.g.:

    "defaultNumberingSystem": "arab",
    "defaultNumberingSystem-alt-latn": "latn",
    "otherNumberingSystems": {
      "native": "arab"
    },

We were previously only parsing "defaultNumberingSystem" and
"otherNumberingSystems". This odd format appears to be an artifact of
converting from XML.
2021-11-12 09:17:08 +00:00
Timothy Flynn ae66188d43 LibUnicode: Capitialize generated identifiers in lieu of full title case
This isn't particularly important because this generates code that is
quite hidden from outside callers. But when viewing the generated code,
it's a bit nicer to read e.g. enum identifiers such as "MinusSign"
rather than "Minussign".
2021-11-12 09:17:08 +00:00
Ali Mohammad Pur 7d1142e2c8 LibWasm: Implement module validation 2021-11-11 09:20:04 +01:00
Ali Mohammad Pur d0ad7efd9b Meta: Update WebAssembly testsuite branch name
The 'master' branch is no longer updated, they've switched to 'main'.
2021-11-11 09:20:04 +01:00
Andreas Kling 8b1108e485 Everywhere: Pass AK::StringView by value 2021-11-11 01:27:46 +01:00
Sam Atkins e52f987020 LibWeb: Make property_initial_value() return a NonnullRefPtr
The finale! Users can now be sure that the value is valid, which makes
things simpler.
2021-11-10 21:58:14 +01:00
Sam Atkins 4d42915485 LibWeb: Ensure that CSS initial values are always valid :^)
First off, this verifies that an initial value is always provided in
Properties.json for each property.

Second, it verifies that parsing that initial value succeeds.

This means that a call to `property_initial_value()` will always return
a valid StyleValue. :^)
2021-11-10 21:58:14 +01:00
Tim Schumacher d1eb604896 CMake: Build serenity_lib libraries with a custom SONAME
This allows libraries and binaries to explicitly link against
`<library>.so.serenity`, which avoids some confusion if there are other
libraries with the same name, such as OpenSSL's `libcrypto`.
2021-11-10 14:42:49 +01:00
Tim Schumacher 46aa477b8f CMake: Remove unused serenity_shared_lib function 2021-11-10 14:42:49 +01:00
Sam Atkins 901a990b1b LibWeb: Remove concept of CSS pseudo-properties
We don't need them any more, so they're gone. :^)
2021-11-10 14:38:49 +01:00
Timothy Flynn 03c023d7e9 LibUnicode: Upgrade to CLDR version 40.0.0
Release notes:
https://github.com/unicode-org/cldr-json/releases/tag/40.0.0
2021-11-09 20:44:52 +01:00
Timothy Flynn 357c97dfa8 LibUnicode: Parse the CLDR's defaultContent.json locale list
This file contains the list of locales which default to their parent
locale's values. In the core CLDR dataset, these locales have their own
files, but they are empty (except for identity data). For example:

https://github.com/unicode-org/cldr/blob/main/common/main/en_US.xml

In the JSON export, these files are excluded, so we currently are not
recognizing these locales just by iterating the locale files.

This is a prerequisite for upgrading to CLDR version 40. One of these
default-content locales is the popular "en-US" locale, which defaults to
"en" values. We were previously inferring the existence of this locale
from the "en-US-POSIX" locale (many implementations, including ours,
strip variants such as POSIX). However, v40 removes the "en-US-POSIX"
locale entirely, meaning that without this change, we wouldn't know that
"en-US" exists (we would default to "en").

For more detail on this and other v40 changes, see:
https://cldr.unicode.org/index/downloads/cldr-40#h.nssoo2lq3cba
2021-11-09 20:44:52 +01:00