Commit graph

503 commits

Author SHA1 Message Date
Timothy Flynn 2a7f36b392 LibJS+LibUnicode: Generate unique numeric symbol lists
There are 443 number system objects generated, each of which held an
array of number system symbols. Of those 443 arrays, only 39 are unique.

To uniquely store these, this change moves the generated NumericSymbol
enumeration to the public LibUnicode/NumberFormat.h header with a pre-
defined set of symbols that we need. This is to ensure the generated,
unique arrays are created in a known order with known symbols. While it
is unfortunate to no longer discover these symbols at generation time,
it does allow us to ignore unwanted symbols and perform less string-to-
enumeration conversions at lookup time.
2021-12-11 14:17:47 +00:00
Timothy Flynn 9cc323b0b0 LibUnicode: Generate unique NumberFormat lists for each Unit 2021-12-11 14:17:47 +00:00
Timothy Flynn cdbfe01827 LibUnicode: Generate unique NumberFormat lists for each NumberSystem 2021-12-11 14:17:47 +00:00
Timothy Flynn 76af9fae63 LibUnicode: Support storing lists in UniqueStorage for code generators
The evolution of UniqueStorage has been as follows:

1. It was created as UniqueStringStorage to ensure only one copy of each
   unique string is generated. Interested parties stored an index into
   a unique string list, rather than the string itself.
   Commits: f9e605397c and 04e6b43f05

2. It became apparent that non-string structures could also be de-
   duplicated to reduce the size of libunicode.so. UniqueStringStorage
   was generalized to UniqueStorage for this purpose.
   Commit: d8e6beb14f

It's now also apparent that there's heavy duplication of lists of
structures. For example, the NumberFormat generator stores 4 lists of
NumberFormat objects. In total, we currently generate nearly 2,000 lists
of these objects, of which 275 are unique.

This change updates UniqueStorage to support storing lists. The only
change is how the storage is generated - we generate each stored list
individually, then an array storing spans of those lists.
2021-12-11 14:17:47 +00:00
Timothy Flynn a417c23de0 LibUnicode: Parse and generate per-locale day period ranges 2021-12-10 21:27:24 +00:00
Timothy Flynn fa8e881cfa LibUnicode: Parse and generate secondary day period symbols
Generate morning2, afternoon2, evening2, and night2 symbols.
2021-12-10 21:27:24 +00:00
Timothy Flynn 76aab821f4 LibJS+LibUnicode: Rename some Unicode::DayPeriod values
In the CLDR, there aren't "night" values, there are "night1" & "night2"
values. This is for locales which use a different name for nighttime
depending on the hour. For example, the ja locale uses "夜" between the
hours of 19:00 and 23:00, and "夜中" between the hours of 23:00 and
04:00. Our CLDR parser is currently ignoring "night2", so this rename
is to prepare for that.

We could probably come up with better names, but in the end, the API in
LibUnicode will be such that outside callers won't even see Night1, etc.
2021-12-10 21:27:24 +00:00
Timothy Flynn 9d4c4303fd LibUnicode: Parse and generate date time range format patterns 2021-12-09 23:43:04 +00:00
Timothy Flynn fe84a365c2 LibUnicode: Parse and generate format pattern skeletons
Pattern skeletons are more or less the "key" of format patterns. Every
format pattern is assigned a skeleton. Interval patterns (which are not
yet parsed) are also assigned a skeleton - this is used to match them to
an "owning" format pattern. So we will use the skeleton generated here
to match format patterns at runtime with their available interval
patterns.

An alternative approach would be to append interval patterns directly to
their owning format pattern, but this has some draw backs:

    1. Skeletons aren't totally unique. A skeleton may appear in both
       the "dateFormats" and "availableFormats" objects, in which case
       the same interval formats would be generated more than once.

    2. Otherwise unique format patterns may only differ by the interval
       patterns assigned to them. This would cause the UniqueStorage for
       the format patterns to increase in size, impacting both compile
       times and libunicode.so size.
2021-12-09 23:43:04 +00:00
Timothy Flynn b17c6ab661 LibUnicode: Fix typo in format pattern parser
See: https://unicode.org/reports/tr35/tr35-dates.html#dfst-day
2021-12-09 23:43:04 +00:00
Sam Atkins c9062b4ed5 LibWeb: Remove now-unused CustomStyleValue 2021-12-09 21:30:31 +01:00
Timothy Flynn b76e44f66f LibUnicode: Parse and generate time zone names in long and short form 2021-12-08 11:29:36 +00:00
Timothy Flynn 2bbf8aa24c LibUnicode: Generate era, month, weekday and day period calendar symbols
The parsing in parse_calendar_symbols() might be a bit more verbose than
it really needs to be, but it is to ensure the symbols are generated in
a known order that we can control with enumerations.
2021-12-08 11:29:36 +00:00
Timothy Flynn 9f7c727720 LibJS+LibUnicode: Generate missing patterns with fractionalSecondDigits
TR-35's Matching Skeleton algorithm dictates how user requests including
fractional second digits should be handled when the CLDR format pattern
does not include that field. When the format pattern contains {second},
but does not contain {fractionalSecondDigits}, generate a second pattern
which appends "{decimal}{fractionalSecondDigits}" to the {second} field.
2021-12-08 11:29:36 +00:00
Timothy Flynn 6ace4000bf LibJS+LibUnicode: Supply field type in CalendarPattern's for-each method
Some callers will want different behavior depending on what field is
being provided to the callback.
2021-12-08 11:29:36 +00:00
Timothy Flynn 80ea6e664d LibUnicode: Do not set day period format length for {ampm} segments
TR-35 does define lengths for {ampm}, but they are unused by ECMA-402.
To the contrary, defining the day_period length for this segment will
prevent BasicFormatMatcher from ever selecting a pattern that contains
this segment. Instead, ECMA-402 will only use the short length for
{ampm} segments.
2021-12-08 11:29:36 +00:00
Timothy Flynn dfe8d02482 LibUnicode: Generate missing format patterns
TR-35 describes how to combine date, time, and available formats with
date-time format patterns to generate more available format patterns:
https://unicode.org/reports/tr35/tr35-dates.html#Missing_Skeleton_Fields

Use these steps to generate ~400 new patterns for each calendar. These
are required for ECMA-402's BasicFormatMatcher to produce reasonable
results.
2021-12-06 15:46:34 +01:00
Timothy Flynn 439b06bf0f LibUnicode: Fully parse date-time formatting patterns
Similar to NumberFormat, replace the segments of date-time patterns with
partitions that can be split at runtime. Also generate the pattern style
fields for e.g. era, day, hour, etc.
2021-12-06 15:46:34 +01:00
Timothy Flynn 2772606527 LibUnicode: Generate unique calendar pattern structures
Add unique storage for parsed CalendarPattern structures to ensure only
one copy of each structure is generated.

This doesn't have any impact on libunicode.so with the current generated
data. Rather, this prevents the amount of generated data from needlessly
growing astronomically once date-time patterns are fully parsed. There
will be 173,459 patterns parsed, of which only 22,495 (about 12%) are
unique. This change will save a few MB, and will also help compilation
times.
2021-12-06 15:46:34 +01:00
Timothy Flynn 1d735105c3 LibUnicode: Generate per-locale, per-calendar formats out of line
Currently, there's only a handful of entries in these arrays, so it is
not a huge deal to generate them inline with the struct that holds them.
But they will each soon contain a few hundred entries. Generate them out
of line for easier viewing in the generated code.
2021-12-06 15:46:34 +01:00
Timothy Flynn 945ca81dd7 LibUnicode: Generate unique number format structures
Add unique storage for parsed NumberFormat structures to ensure only one
copy of each structure is generated. Reduces libunicode.so on x86 from
13.2 MB to 11.4 MB.
2021-12-06 15:46:34 +01:00
Timothy Flynn d8e6beb14f LibUnicode: Generalize the generators' unique string storage
UniqueStringStorage is used to ensure only one copy of a string will be
generated, and interested parties store just an index into the generated
storage. Generalize this class to allow any* type to be stored uniquely.

* To actually be storable, the type must have both an AK::Format and an
AK::Traits overload available.
2021-12-06 15:46:34 +01:00
Sam Atkins 16e5f24e64 Fuzzers: Cast unused smart-pointer return values to void 2021-12-05 15:31:03 +01:00
Sam Atkins f3d8f80e9c IPCCompiler: Cast return value of synchronous void IPC calls to void
The synchronous call returns a NonnullOwnPtr that we don't use, so we
have to cast to prevent a compiler warning once smart pointers become
[[nodiscard]].
2021-12-05 15:31:03 +01:00
Timothy Flynn bf79c73158 LibUnicode: Do not generate data for "generic" calendars
This is not a calendar supported by ECMA-402, so let's not waste space
with its data.

Further, don't generate "gregorian" as a valid Unicode locale extension
keyword. It's an invalid type identifier, thus cannot be used in locales
such as "en-u-ca-gregorian".
2021-12-01 16:36:26 +00:00
Timothy Flynn 7e6ad172a4 LibUnicode: Support code point names that apply to ranges of code points
For example, consider the following adjacent entries in UnicodeData.txt:

    3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
    4DBF;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;

Our current implementation would assign the display name "CJK Ideograph
Extension A" to code points U+3400 & U+4DBF, but not to the code points
in between. Not only should those code points be assigned a name, but
the Unicode spec also has formatting rules on what the names should be
(the names for these ranged code points are not as they appear in
UnicodeData.txt).

The spec also defines names for code point ranges that actually are
listed individually in UnicodeData.txt. For example:

    2F800;CJK COMPATIBILITY IDEOGRAPH-2F800;Lo;0;L;4E3D;;;;N;;;;;
    2F801;CJK COMPATIBILITY IDEOGRAPH-2F801;Lo;0;L;4E38;;;;N;;;;;
    2F802;CJK COMPATIBILITY IDEOGRAPH-2F802;Lo;0;L;4E41;;;;N;;;;;

Code points are only coalesced into a range if all fields after the name
are equivalent. Our parser will insert the range and its name formatting
pattern when it comes across the first code point in that range, then
ignore other code points in that range. This reduces the number of names
we generated by nearly 2,000.
2021-11-30 11:24:02 +01:00
Timothy Flynn f2f4980f15 LibUnicode: Remove unused field from UnicodeData generator 2021-11-30 11:24:02 +01:00
Timothy Flynn 71903ea7e1 LibUnicode: Parse and generate calendar (ca) Unicode keywords
Also removes a few fly-by "StringView x = nullptr;" unnecessary
initializers.
2021-11-29 22:48:46 +00:00
Timothy Flynn 48ce72e472 LibUnicode: Parse and generate regional hour cycles
Unlike most data in the CLDR, hour cycles are not stored on a per-locale
basis. Instead, they are keyed by a string that is usually a region, but
sometimes is a locale. Therefore, given a locale, to determine the hour
cycles for that locale, we:

    1. Check if the locale itself is assigned hour cycles.
    2. If the locale has a region, check if that region is assigned hour
       cycles.
    3. Otherwise, maximize that locale, and if the maximized locale has
       a region, check if that region is assigned hour cycles.
    4. If the above all fail, fallback to the "001" region.

Further, each locale's default hour cycle is the first assigned hour
cycle.
2021-11-29 22:48:46 +00:00
Timothy Flynn 15fc03ef34 LibUnicode: Sort generated enums case-insensitively
This hasn't mattered yet by chance, because the source for all enums
contains names of the same case. But the enum generated for hour cycle
regions will have mixed case. Sort them case-insensitively in order to
traverse these names in the same order in both generate_enum and
generate_mapping.
2021-11-29 22:48:46 +00:00
Timothy Flynn 7872934861 LibUnicode: Parse and generate available candidate format patterns
These formats are used by ECMA-402 when neither a date nor time style is
specified. In that case, these patterns are searched for a best match.
2021-11-29 22:48:46 +00:00
Timothy Flynn 287d43f4be LibUnicode: Hard-code an alias from the Gregorian calendar to Gregory
This alias exists because the name "Gregorian" is too long to be used in
a locale identifier, i.e. "en-u-ca-gregorian" is invalid. Aliases for
calendars are defined here:
https://github.com/unicode-org/cldr-json/blob/main/cldr-json/cldr-bcp47/bcp47/calendar.json

However, CLDR version 40 neglected to actually include the cldr-bcp47
package in its release, so we don't have access to this data. So for now
hard-code this alias so that JavaScript can actually access it. See:
https://unicode-org.atlassian.net/browse/CLDR-15158
2021-11-29 22:48:46 +00:00
Timothy Flynn f471ecdbe9 LibUnicode: Parse and generate date, time, and date-time format patterns 2021-11-29 22:48:46 +00:00
Timothy Flynn 5c57341672 LibUnicode: Create a nearly empty generator for date-time formatting
Similar to number formatting, the data for date-time formatting will be
located in its own generated file. This extracts the cldr-dates package
from the CLDR and sets up the generator plumbing to create the date-time
data files.
2021-11-29 22:48:46 +00:00
Timothy Flynn 914675e826 LibJS+LibUnicode: Separate number formatting methods from Locale.h
Currently, we generate separate data files for locale and number format
related tables/methods, but provide public accessors for all of the data
in one Locale.h file. Rather than continuing this trend for date-time,
relative time, etc. formatting, it's a bit easier to reason about if the
public accessors are also in separate files.
2021-11-29 22:48:46 +00:00
Hendiadyoin1 7a27ecc135 Tests: Add a simple LibGL render-test
At the moment we just check if we *can* render a simple triangle, we do
not yet actually test if the image is indeed the triangle we wanted.

This test also outputs the rendered image when GL_DEBUG is enabled to a
file called "picture.bmp" for manual verification.

Co-authored-by: sunverwerth <s.unverwerth@serenityos.org>
2021-11-29 23:17:05 +03:30
Hendiadyoin1 3a4dd5ff87 Lagom: Add LibGL to the libraries 2021-11-29 23:17:05 +03:30
Hendiadyoin1 849089c406 Lagom: Disable implicit-const-int-float-conversion warnings 2021-11-29 23:17:05 +03:30
Andreas Kling cb9cac4e40 LibIPC+IPCCompiler+AK: Make IPC value decoders return ErrorOr<void>
This allows us to use TRY() in decoding helpers, leading to a nice
reduction in line count.
2021-11-28 23:14:19 +01:00
Andreas Kling 8d76eb773f LibIPC: Make IPC::Connection::post_message() return ErrorOr 2021-11-28 23:14:18 +01:00
kleines Filmröllchen 96d02a3e75 LibAudio: New error propagation API in Loader and Buffer
Previously, a libc-like out-of-line error information was used in the
loader and its plugins. Now, all functions that may fail to do their job
return some sort of Result. The universally-used error type ist the new
LoaderError, which can contain information about the general error
category (such as file format, I/O, unimplemented features), an error
description, and location information, such as file index or sample
index.

Additionally, the loader plugins try to do as little work as possible in
their constructors. Right after being constructed, a user should call
initialize() and check the errors returned from there. (This is done
transparently by Loader itself.) If a constructor caused an error, the
call to initialize should check and return it immediately.

This opportunity was used to rework a lot of the internal error
propagation in both loader classes, especially FlacLoader. Therefore, a
couple of other refactorings may have sneaked in as well.

The adoption of LibAudio users is minimal. Piano's adoption is not
important, as the code will receive major refactoring in the near future
anyways. SoundPlayer's adoption is also less important, as changes to
refactor it are in the works as well. aplay's adoption is the best and
may serve as an example for other users. It also includes new buffering
behavior.

Buffer also gets some attention, making it OOM-safe and thereby also
propagating its errors to the user.
2021-11-28 13:33:51 -08:00
Timothy Flynn 0aa3e5c2ea LibUnicode: Port generator utility methods to ErrorOr
Most of these were VERIFY-ing for success, but propagating an error
message up to serenity_main() is much nicer than just a SIGABRT.
2021-11-23 22:58:05 +01:00
Timothy Flynn 55e0b91d8d LibUnicode: Port GenerateUnicodeNumberFormat to ErrorOr and LibMain 2021-11-23 22:58:05 +01:00
Timothy Flynn 8c5f19f7c8 LibUnicode: Port GenerateUnicodeLocale to ErrorOr and LibMain 2021-11-23 22:58:05 +01:00
Timothy Flynn 88dbf3c348 LibUnicode: Port GenerateUnicodeData to ErrorOr and LibMain
Also store command line arguments as StringViews rather than pointers.
2021-11-23 22:58:05 +01:00
Timothy Flynn 4c4b752ab8 Meta: Allow lagom_tool invocations to specify libraries to link 2021-11-23 22:58:05 +01:00
Timothy Flynn a2ea704d21 Meta: Define LagomMain outside of the BUILD_LAGOM branch
This allows code generators to use LagomMain. Otherwise, during CI, they
are built during the superbuild without BUILD_LAGOM=ON.
2021-11-23 22:58:05 +01:00
Timothy Flynn 0e80c1ee6b LibUnicode: Invoke lagom_tool() with SOURCES inline 2021-11-23 22:58:05 +01:00
Linus Groh cfecfbb214 js: Port to LibMain :^)
This wasn't particularly difficult, and there's not much use for the
nicer interface yet either. While unveil() is of limited use in js(1)
as it should be able to open arbitrary files, I feel like we should be
able to add a pledge() call.
2021-11-22 23:07:43 +01:00
Linus Groh ba0f89a4d1 Lagom: Add LibMain as a lagom_lib() 2021-11-22 23:07:43 +01:00
Andreas Kling 5a79c69b02 LibGfx: Make ImageDecoderPlugin::frame() return ErrorOr<>
This is a first step towards better error propagation from image codecs.
2021-11-21 20:22:48 +01:00
Ben Wiederhake b06b54772e Meta+LibUnicode: Provide code point names through library 2021-11-20 00:31:55 +01:00
Timothy Flynn 93ee922027 LibUnicode: Support locales-without-script aliases for ECMA-402
As noted by ECMA-402, if a supported locale contains all of a language,
script, and region subtag, then the implementation must also support the
locale without the script subtag. The most complicated example of this
is the zh-TW locale.

The list of locales in the CLDR database does not include zh-TW or its
maximized zh-Hant-TW variant. Instead, it inlcudes the zh-Hant locale.
However, zh-Hant-TW is listed in the default-content locale list in the
cldr-core package. This defines an alias from zh-Hant-TW to zh-Hant. We
must then also support the zh-Hant-TW alias without the script subtag:
zh-TW. This transitively maps zh-TW to zh-Hant, which is a case quite
heavily tested by test262.
2021-11-19 11:45:35 +01:00
Timothy Flynn 4b535ce1c8 LibUnicode: Stop passing the cldr-core package to UnicodeNumberFormat
This is no longer needed now that this generator isn't parsing the
default-content locales.
2021-11-19 11:45:35 +01:00
Timothy Flynn a13fa15a30 LibUnicode: Generate default-content locales as aliases
Previously, we were just copying the locale data into default-content
locales (for example, copying the "en" data into "en-US"). Instead, we
can just define the default-content locales as aliases to their main
locales.
2021-11-19 11:45:35 +01:00
Timothy Flynn 9d1519e21c LibUnicode: Move GenerateUnicodeData's Alias struct to generator header
This will be used for locale aliases as well. Also rename the "property"
field in this struct to "name", as it no longer is only used for
property aliases.
2021-11-19 11:45:35 +01:00
Andreas Kling 2b866e3c9b LibGfx: Remove ImageDecoderPlugin::bitmap() in favor of frame(index)
To encourage proper support for multi-frame images throughout the
system, get rid of the single-frame convenience bitmap() API.
2021-11-18 21:11:30 +01:00
Andreas Kling 750f1d580a Fuzzers: Use ImageDecoderPlugin::frame() in image decoder fuzzers
Let's work towards getting rid of the first-frame-only bitmap() API.
2021-11-18 21:11:30 +01:00
Andreas Kling 587f9af960 AK: Make JSON parser return ErrorOr<JsonValue> (instead of Optional)
Also add slightly richer parse errors now that we can include a string
literal with returned errors.

This will allow us to use TRY() when working with JSON data.
2021-11-17 00:21:10 +01:00
Timothy Flynn cafb717486 LibUnicode: Parse and generate CLDR unit data for Intl.NumberFormat
The units data is in another CLDR package, cldr-units.
2021-11-16 23:14:09 +00:00
Timothy Flynn c24a350a18 LibUnicode: Ignore U+200F when parsing format identifiers
Noticed this while implementing multiple identifier support. We were
errantly parsing U+200F as a lone identifier in some Hebrew formats.
2021-11-16 23:14:09 +00:00
Timothy Flynn 04b8b87c17 LibJS+LibUnicode: Support multiple identifiers within format pattern
This wasn't the case for compact patterns, but unit patterns can contain
multiple (up to 2, really) identifiers that must each be recognized by
LibJS.

Each generated NumberFormat object now stores an array of identifiers
parsed. The format pattern itself is encoded with the index into this
array for that identifier, e.g. the compact format string "0K" will
become "{number}{compactIdentifier:0}".
2021-11-16 23:14:09 +00:00
Timothy Flynn 3b68370212 LibJS+LibUnicode: Rename the generated compact_identifier to identifier
This field is currently used to store the StringView into the compact
name/symbol in the format string. Units will need to store a similar
field, so rename the field to be more generic, and extract the parser
for it.
2021-11-16 23:14:09 +00:00
Timothy Flynn 1f546476d5 LibJS+LibUnicode: Fix computation of compact pattern exponents
The compact scale of each formatting rule was precomputed in commit:
be69eae651

Using the formula: compact scale = magnitude - pattern scale

This computation was off-by-one.

For example, consider the format key "10000-count-one", which maps to
"00 thousand" in en-US. What we are really after is the exponent that
best represents the string "thousand" for values greater than 10000
and less than 100000 (the next format key). We were previously doing:

    log10(10000) - "00 thousand".count("0") = 2

Which clearly isn't what we want. Instead, if we do:

    log10(10000) + 1 - "00 thousand".count("0") = 3

We get the correct exponent for each format key for each locale.

This commit also renames the generated variable from "compact_scale" to
"exponent" to match the terminology used in ECMA-402.
2021-11-16 00:56:55 +00:00
Timothy Flynn 48d5684780 LibUnicode: Parse compact identifiers and replace them with a format key
For example, in en-US, the decimal, long compact pattern for numbers
between 10,000 and 100,000 is "00 thousand". In that pattern, "thousand"
is the compact identifier, and the generated format pattern is now
"{number} {compactIdentifier}". This also generates that identifier as
its own field in the NumberFormat structure.
2021-11-16 00:56:55 +00:00
Timothy Flynn 30fbb7d9cd LibUnicode: Parse and generate scientific formatting rules 2021-11-14 17:00:35 +00:00
Timothy Flynn 3645f6a0fc LibUnicode: Fix typo in percent format parser
Just by sheer luck this had no actual effect because the decimal format
prefix has the same length as the percent format prefix.
2021-11-14 17:00:35 +00:00
Timothy Flynn 3b7f5af042 LibUnicode: Generate primary and secondary number grouping sizes
Most locales have a single grouping size (the number of integer digits
to be written before inserting a grouping separator). However some have
a primary and secondary size. We parse the primary size as the size used
for the least significant integer digits, and the secondary size for the
most significant.
2021-11-14 10:35:19 +00:00
Timothy Flynn c65dea64bd LibJS+LibUnicode: Don't remove {currency} keys in GetNumberFormatPattern
In order to implement Intl.NumberFormat.prototype.formatToParts, do not
replace {currency} keys in the format pattern before ECMA-402 tells us
to. Otherwise, the array return by formatToParts will not contain the
expected currency key.

Early replacement was done to avoid resolving the currency display more
than once, as it involves a couple of round trips to search through
LibUnicode data. So this adds a non-standard method to NumberFormat to
do this resolution and cache the result.

Another side effect of this change is that LibUnicode must replace unit
format patterns of the form "{0} {1}" during code generation. These were
previously skipped during code generation because LibJS would just
replace the keys with the currency display at runtime. But now that the
currency display injection is delayed, any {0} or {1} keys in the format
pattern will cause PartitionNumberPattern to abort.
2021-11-13 19:01:25 +00:00
Timothy Flynn a701ed52fc LibJS+LibUnicode: Fully implement currency number formatting
Currencies are a bit strange; the layout of currency data in the CLDR is
not particularly compatible with what ECMA-402 expects. For example, the
currency format in the "en" and "ar" locales for the Latin script are:

    en: "¤#,##0.00"
    ar: "¤\u00A0#,##0.00"

Note how the "ar" locale has a non-breaking space after the currency
symbol (¤), but "en" does not. This does not mean that this space will
appear in the "ar"-formatted string, nor does it mean that a space won't
appear in the "en"-formatted string. This is a runtime decision based on
the currency display chosen by the user ("$" vs. "USD" vs. "US dollar")
and other rules in the Unicode TR-35 spec.

ECMA-402 shies away from the nuances here with "implementation-defined"
steps. LibUnicode will store the data parsed from the CLDR however it is
presented; making decisions about spacing, etc. will occur at runtime
based on user input.
2021-11-13 11:52:45 +00:00
Timothy Flynn e9493a2cd5 LibUnicode: Ensure UnicodeNumberFormat is aware of default content
For example, there isn't a unique set of data for the en-US locale;
rather, it defaults to the data for the en locale. See this commit for
much more detail: 357c97dfa8
2021-11-13 11:52:45 +00:00
Timothy Flynn 9421d5c0cf LibUnicode: Generate currency unit-pattern number formats
These are used when formatting a number as currency with a display
option of "name" (e.g. for USD, the name is "US Dollars" in en-US).

These patterns appear in the CLDR in a different manner than other
number formats that are pluralized. They are of the form "{0} {1}",
therefore do not undergo subpattern replacements.
2021-11-13 11:52:45 +00:00
Timothy Flynn 39e031c4dd LibJS+LibUnicode: Generate all styles of currency localizations
Currently, LibUnicode is only parsing and generating the "long" style of
currency display names. However, the CLDR contains "short" and "narrow"
forms as well that need to be handled. Parse these, and update LibJS to
actually respect the "style" option provided by the user for displaying
currencies with Intl.DisplayNames.

Note: There are some discrepencies between the engines on how style is
handled. In particular, running:

new Intl.DisplayNames('en', {type:'currency', style:'narrow'}).of('usd')

Gives:

  SpiderMoney: "USD"
  V8: "US Dollar"
  LibJS: "$"

And running:

new Intl.DisplayNames('en', {type:'currency', style:'short'}).of('usd')

Gives:

  SpiderMonkey: "$"
  V8: "US Dollar"
  LibJS: "$"

My best guess is V8 isn't handling style, and just returning the long
form (which is what LibJS did before this commit). And SpiderMoney can
handle some styles, but if they don't have a value for the requested
style, they fall back to the canonicalized code passed into of().
2021-11-13 11:52:45 +00:00
Timothy Flynn 6cfd63e5bd LibUnicode: Parse numbers in number formats a bit more leniently
The parser was previously expecting number sections within a pattern to
start with "#", but they may also begin with "0".
2021-11-13 11:52:45 +00:00
Andreas Kling b189c88ec2 Fuzzers: Use ImageDecoders instead of load_FORMAT_from_memory() wrappers 2021-11-13 00:55:07 +01:00
Timothy Flynn 1f2ac0ab41 LibUnicode: Move number formatting code generator to UnicodeNumberFormat 2021-11-12 20:46:38 +00:00
Timothy Flynn 04e6b43f05 LibUnicode: Move (soon-to-be) common code out of GenerateUnicodeLocale
The data used for number formatting is going to grow quite a bit when
the cldr-units package is parsed. To prevent the generated UnicodeLocale
file from growing outrageously large, the number formatting data can go
into its own file. To prepare for this, move code that will be common
between the generators for UnicodeLocale and UnicodeNumberFormat to the
utility header.
2021-11-12 20:46:38 +00:00
Timothy Flynn be69eae651 LibUnicode: Precompute the compact scale of each number formatting rule
This will be needed for the ComputeExponentForMagnitude AO for compact
formatting, namely step 5b:

  Let exponent be an implementation- and locale-dependent (ILD) integer
  by which to scale a number of the given magnitude in compact notation
  for the current locale.
2021-11-12 09:17:08 +00:00
Timothy Flynn 230b133ee3 LibUnicode: Parse number formats into zero/positive/negative patterns
A number formatting pattern in the CLDR contains one or two entries,
delimited by a semi-colon. Previously, LibUnicode was just storing the
entire pattern as one string. This changes the generator to split the
pattern on that delimiter and generate the 3 unique patterns expected by
ECMA-402.

The rules for generating the 3 patterns are as follows:

* If the pattern contains 1 entry, it is the zero pattern. The positive
  pattern is the zero pattern prepended with {plusSign}. The negative
  pattern is the zero pattern prepended with {minusSign}.

* If the pattern contains 2 entries, the first is the zero pattern, and
  the second is the negative pattern. The positive pattern is the zero
  pattern prepended with {plusSign}.
2021-11-12 09:17:08 +00:00
Timothy Flynn 1244ebcd4f LibUnicode: Parse and generate standard accounting formatting rules
Also known as "currency-accounting" in some CLDR documentation.
2021-11-12 09:17:08 +00:00
Timothy Flynn 967afc1b84 LibUnicode: Parse and generate standard currency formatting rules 2021-11-12 09:17:08 +00:00
Timothy Flynn bffd73e0d4 LibUnicode: Parse and generate standard decimal formatting rules 2021-11-12 09:17:08 +00:00
Timothy Flynn feb8c22a62 LibUnicode: Parse and generate standard percentage formatting rules 2021-11-12 09:17:08 +00:00
Timothy Flynn 4317a1b552 LibUnicode: Parse and generate compact currency formatting rules 2021-11-12 09:17:08 +00:00
Timothy Flynn 604a596c90 LibUnicode: Parse and generate compact decimal formatting rules 2021-11-12 09:17:08 +00:00
Timothy Flynn 12b468a588 LibUnicode: Begin parsing and generating locale number systems
The number system data in the CLDR contains information on how to format
numbers in a locale-dependent manner. Start parsing this data, beginning
with numeric symbol strings. For example the symbol NaN maps to "NaN" in
the en-US locale, and "非數值" in the zh-Hant locale.
2021-11-12 09:17:08 +00:00
Timothy Flynn d3e83c9934 LibUnicode: Parse alternate default numbering systems
Some locales in the CLDR have alternate default numbering systems listed
under "defaultNumberingSystem-alt-*", e.g.:

    "defaultNumberingSystem": "arab",
    "defaultNumberingSystem-alt-latn": "latn",
    "otherNumberingSystems": {
      "native": "arab"
    },

We were previously only parsing "defaultNumberingSystem" and
"otherNumberingSystems". This odd format appears to be an artifact of
converting from XML.
2021-11-12 09:17:08 +00:00
Timothy Flynn ae66188d43 LibUnicode: Capitialize generated identifiers in lieu of full title case
This isn't particularly important because this generates code that is
quite hidden from outside callers. But when viewing the generated code,
it's a bit nicer to read e.g. enum identifiers such as "MinusSign"
rather than "Minussign".
2021-11-12 09:17:08 +00:00
Andreas Kling 8b1108e485 Everywhere: Pass AK::StringView by value 2021-11-11 01:27:46 +01:00
Sam Atkins e52f987020 LibWeb: Make property_initial_value() return a NonnullRefPtr
The finale! Users can now be sure that the value is valid, which makes
things simpler.
2021-11-10 21:58:14 +01:00
Sam Atkins 4d42915485 LibWeb: Ensure that CSS initial values are always valid :^)
First off, this verifies that an initial value is always provided in
Properties.json for each property.

Second, it verifies that parsing that initial value succeeds.

This means that a call to `property_initial_value()` will always return
a valid StyleValue. :^)
2021-11-10 21:58:14 +01:00
Sam Atkins 901a990b1b LibWeb: Remove concept of CSS pseudo-properties
We don't need them any more, so they're gone. :^)
2021-11-10 14:38:49 +01:00
Timothy Flynn 357c97dfa8 LibUnicode: Parse the CLDR's defaultContent.json locale list
This file contains the list of locales which default to their parent
locale's values. In the core CLDR dataset, these locales have their own
files, but they are empty (except for identity data). For example:

https://github.com/unicode-org/cldr/blob/main/common/main/en_US.xml

In the JSON export, these files are excluded, so we currently are not
recognizing these locales just by iterating the locale files.

This is a prerequisite for upgrading to CLDR version 40. One of these
default-content locales is the popular "en-US" locale, which defaults to
"en" values. We were previously inferring the existence of this locale
from the "en-US-POSIX" locale (many implementations, including ours,
strip variants such as POSIX). However, v40 removes the "en-US-POSIX"
locale entirely, meaning that without this change, we wouldn't know that
"en-US" exists (we would default to "en").

For more detail on this and other v40 changes, see:
https://cldr.unicode.org/index/downloads/cldr-40#h.nssoo2lq3cba
2021-11-09 20:44:52 +01:00
Ben Wiederhake 585554a245 Meta: Implement checker for IPC magic number collisions 2021-11-05 00:17:01 +03:30
Ben Wiederhake 93356ee3df IPCCompiler: Remove now-unused ability to hardcode magic number 2021-11-05 00:17:01 +03:30
Ben Wiederhake 686efb6737 ConfigureComponents: Reduce duplicated code 2021-11-02 11:36:23 +01:00
Idan Horowitz 19e28d5798 LibWeb: Convert is_named_property_exposed_on_object to ThrowCompletions
This is the last usage of old-style exceptions in the WrapperGenerator.
2021-11-02 10:41:25 +02:00
Timothy Flynn 95e492de59 LibWeb: Convert throw_dom_exception_if_needed() to ThrowCompletionOr
This changes Web::Bindings::throw_dom_exception_if_needed() to return a
JS::ThrowCompletionOr instead of an Optional. This allows callers to
wrap the invocation with a TRY() macro instead of making a follow-up
call to should_return_empty(). Further, this removes all invocations to
vm.exception() in the generated bindings.
2021-10-31 18:51:07 +01:00
Idan Horowitz ae510db72c FuzzilliJS: Convert native functions to ThrowCompletionOr 2021-10-31 18:20:37 +02:00
Timothy Flynn c19c306744 LibWeb: Convert all generated bindings to ThrowCompletionOr
This also required converting URLSearchParams::for_each and the callback
function it invokes to ThrowCompletionOr. With this, the ReturnType enum
used by WrapperGenerator is removed as all callers would be using
ReturnType::Completion.
2021-10-31 15:48:36 +01:00
Brendan Coles 91de60d912 Lagom/Fuzzers: Add fuzzer for PDF document 2021-10-30 10:33:56 -07:00
Andreas Kling 398c181c79 LibJS: Rename PropertyName to PropertyKey
Let's use the same name as the spec. :^)
2021-10-24 17:18:07 +02:00
Ben Wiederhake fc519d43ba Fuzzing: Update build instructions
The project needs clang-12, which is not on all systems the default
(e.g. Debian Testing).
2021-10-23 19:29:59 +01:00
Idan Horowitz db5df26841 LibJS: Convert Array AOs to ThrowCompletionOr 2021-10-22 15:07:04 +03:00
Linus Groh 5832de62fe LibJS: Convert NativeFunction::{call,construct}() to ThrowCompletionOr
Both at the same time because many of them call construct() in call()
and I'm not keen on adding a bunch of temporary plumbing to turn
exceptions into throw completions.
Also changes the return value of construct() to Object* instead of Value
as it always needs to return an object; allowing an arbitrary Value is a
massive foot gun.
2021-10-21 09:02:23 +01:00
Idan Horowitz 40eb3a39d4 LibJS: Rename define_native_function => define_old_native_function
This method will eventually be removed once all native functions are
converted to ThrowCompletionOr
2021-10-20 12:27:19 +01:00
Idan Horowitz 20163c0584 LibJS: Add ThrowCompletionOr versions of the JS native function macros
The old versions were renamed to JS_DECLARE_OLD_NATIVE_FUNCTION and
JS_DEFINE_OLD_NATIVE_FUNCTION, and will be eventually removed once all
native functions were converted to the new format.
2021-10-20 12:27:19 +01:00
Sam Atkins 04c0c103e0 LibWeb: Distinguish between integer and number when checking StyleValues 2021-10-19 19:12:09 +02:00
Sam Atkins 450b782c18 LibWeb: Distinguish between length and percentage values
Though most CSS properties accept either, some do not, so distinguishing
between them lets us catch some invalid values at parse time.
2021-10-19 19:12:09 +02:00
Timothy Flynn d24ae8063b LibWeb: Implement DOMTokenList for managing space-separated tokens lists
DOMTokenList is used as the return type of, e.g., the Element.classList
property.
2021-10-18 23:33:56 +02:00
Timothy Flynn 4d8320a49a LibWeb: Add initial support for IDL methods with variadic parameters
Adds support for methods whose last parameter is a variadic DOMString.
We constructor a Vector<String> of the remaining arguments to pass to
the C++ implementation.
2021-10-18 23:33:56 +02:00
Idan Horowitz 7bbb92dfe9 LibJS: Convert to_u16() to ThrowCompletionOr 2021-10-18 08:01:38 +03:00
Idan Horowitz cc94bba5c0 LibJS: Convert to_u32() to ThrowCompletionOr 2021-10-18 08:01:38 +03:00
Idan Horowitz f6a5ff7b00 LibJS: Convert to_i32() to ThrowCompletionOr 2021-10-18 08:01:38 +03:00
Timothy Flynn 2a3ac02ef1 LibWeb: Implement (most of) NamedNodeMap to store attributes 2021-10-17 13:51:10 +01:00
Timothy Flynn e01dfaac9a LibWeb: Implement Attribute closer to the spec and with an IDL file
Note our Attribute class is what the spec refers to as just "Attr". The
main differences between the existing implementation and the spec are
just that the spec defines more fields.

Attributes can contain namespace URIs and prefixes. However, note that
these are not parsed in HTML documents unless the document content-type
is XML. So for now, these are initialized to null. Web pages are able to
set the namespace via JavaScript (setAttributeNS), so these fields may
be filled in when the corresponding APIs are implemented.

The main change to be aware of is that an attribute is a node. This has
implications on how attributes are stored in the Element class. Nodes
are non-copyable and non-movable because these constructors are deleted
by the EventTarget base class. This means attributes cannot be stored in
a Vector or HashMap as these containers assume copyability / movability.
So for now, the Vector holding attributes is changed to hold RefPtrs to
attributes instead. This might change when attribute storage is
implemented according to the spec (by way of NamedNodeMap).
2021-10-17 13:51:10 +01:00
Idan Horowitz 1639ed7e0a LibJS: Convert to_double() to ThrowCompletionOr 2021-10-17 12:12:35 +01:00
Luke Wilde cb821e1539 LibWeb: Convert ArrayFromVector wrapper to instead be sequence<T>
This adds the ParamatizedType, as `Vector<String>` doesn't encode the
full type information. It is a separate struct as you can't have
`Vector<Type>` inside of `Type`. This also makes Type RefCounted
because I had to make parse_type return a pointer to make dynamic
casting work correctly.

The reason I made it RefCounted instead of using a NonnullOwnPtr is
because it causes compiler errors that I don't want to figure out right
now.
2021-10-17 01:34:02 +03:00
Nico Weber 4d555e8b95 Lagom: Do not use -fno-sematic-interposition in fuzzer builds
Apparently it breaks the fuzzer build. There's probably a better fix
for this, but for now just unbreak the fuzzer build.
Keep this for non-fuzzer builds though since it's apparently a 17%
speedup for running test262 tests :^)
2021-10-16 14:45:06 +01:00
Nico Weber ec9488a58c Lagom: Build with -fno-no-semantic-interposition
Lagom: Build with -fno-no-semantic-interposition

We build with this in non-lagom builds, and serenity's gcc even adds it
to its CC1_SPEC. Let's use it for lagom too.

Reduces the number of dynamic relocations in liblagom-js.so.0.0.0 (per
`objdump -R`) from 15133 to 14534, and increases its size back to 91M
(95156800 bytes), probably due to more inlining being possible.
This might help perf of lagom binaries.
2021-10-15 21:59:42 +01:00
Nico Weber b11d660ff8 Lagom: Build with -fno-exceptions
We build with this in non-lagom builds, so there's no reason not
to use it in lagom builds as well.

Reduces the size of liblagom-js.so.0.0.0 from 94M to 90M
(from 98352784 to 93831056 bytes to be exact).
2021-10-15 21:59:42 +01:00
Timothy Flynn 3ad159537e LibUnicode: Use u16 for unique string indices instead of size_t
Typically size_t is used for indices, but we can take advantage of the
knowledge that there is approximately only 46K unique strings in the
generated UnicodeLocale.cpp file. Therefore, we can get away with using
u16 to store indices. There is a VERIFY that will fail if we ever exceed
the limits of u16.

On x86_64 builds, this reduces libunicode.so from 9.2 MiB to 7.3 MiB.
On i686 builds, this reduces libunicode.so from 3.9 MiB to 3.3 MiB.

These savings are entirely in the .rodata section of the shared library.
2021-10-15 00:06:18 +01:00
Timothy Flynn ebe704a03d LibWeb: Stub out a basic IntersectionObserver interface
Note there are a couple of type differences between the spec and the IDL
file added in this commit. For example, we will need to support a type
of Variant to handle spec types such as "(double or sequence<double>)".
But for now, this allows web pages to construct an IntersectionObserver
with any valid type.
2021-10-14 10:32:51 +02:00
Timothy Flynn ff66218631 LibWeb: Allow creating "any" types in IDL with integral default values
This enables defining "any" types in IDL files such as:

    any threshold = 0;

This isn't able to parse decimal values yet.
2021-10-14 10:32:51 +02:00
Timothy Flynn f91d63af83 LibUnicode: Generate enum/alias from-string methods without a HashMap
The *_from_string() and resolve_*_alias() generated methods are the last
remaining users of HashMap in the LibUnicode generated files (read: the
last methods not using compile-time structures). This converts these
methods to use an array containing pairs of hash values to the desired
lookup value.

Because this code generation is the same between GenerateUnicodeData.cpp
and GenerateUnicodeLocale.cpp, this adds a GeneratorUtil.h header to the
LibUnicode generators to contain the method that generates the methods.
2021-10-13 16:38:51 +02:00
Linus Groh 52976bfac6 LibJS: Convert to_object() to ThrowCompletionOr 2021-10-13 09:55:10 +01:00
Linus Groh 4d8912a92b LibJS: Convert to_string() to ThrowCompletionOr
Also update get_function_name() to use ThrowCompletionOr, but this is
not a standard AO and should be refactored out of existence eventually.
2021-10-13 09:55:10 +01:00
Linus Groh 44e70d1bc0 LibJS+LibWeb: Let WrapperGenerator deal with legacy_null_to_empty_string
This concept is not present in ECMAScript, and it bothers me every time
I see it.
It's only used by WrapperGenerator, and even there only relevant in two
places, so let's fully remove it from LibJS and use a simple ternary
expression instead:

    cpp_name = js_name.is_null() && legacy_null_to_empty_string
        ? String::empty()
        : js_name.to_string(global_object);
2021-10-11 23:36:03 +01:00
Linus Groh 661dd32432 LibWeb: Add support for the Promise<T> IDL type to WrapperGenerator
This includes parsing parameterized types (foo<T>) as well as generating
the appropriate code in generate_wrap_statement() and generate_to_cpp().
2021-10-11 13:30:17 +01:00
Linus Groh 7afd215e95 LibWeb: Initialize IDL any values without default value to undefined
Previously this would generate the following code:

    JS::Value foo_value;
    if (!foo.is_undefined())
        foo_value = foo;

Which is dangerous as we're passing an empty value around, which could
be exposed to user code again. This is fine with "= null", for which it
also generates:

    else
        foo_value = JS::js_null();

So, in summary: a value of type `any`, not `required`, with no default
value and no initializer from user code will now default to undefined
instead of an empty value.
2021-10-11 13:30:17 +01:00
Linus Groh a9a7d65099 LibWeb: Replace heycam.github.io/webidl URLs with webidl.spec.whatwg.org
Web IDL is now a WHATWG standard and the specification was moved
accordingly: https://twitter.com/annevk/status/1445311275026821120

The old URLs now redirect, but let's use canonical ones.
2021-10-11 13:15:16 +01:00
Andreas Kling fdc1c15064 LibWeb: Stub out a basic ResizeObserver interface
This patch establishes scaffolding for the ResizeObserver API.
2021-10-11 00:54:01 +02:00
Andreas Kling 5c9ca5c2dc LibWeb: Stub out a basic Selection interface
This patch establishes scaffolding for the Selection API.
2021-10-11 00:32:19 +02:00
Ben Wiederhake c06a0bae04 Meta: Fix broken external links
Meta/Lagom/ReadMe.md never had any other name; not sure how that typo
happened.

The link to the non-existent directory is especially vexing because the
text goes on to explain that we don't want such a directory to exist.

Found by running markdown-checker, and 'wget'ing all external links.
2021-10-10 15:18:55 -07:00
Ben Wiederhake 3f88d65b78 markdown-checker: New tool that checks document links 2021-10-10 15:18:55 -07:00
Timothy Flynn 597379e864 LibUnicode: Generate and use unique locale-related alias strings
Almost all of these are already in the unique string list.
2021-10-10 22:21:48 +02:00
Timothy Flynn acb7bd917f LibUnicode: Generate and use unique subtag and complex alias strings 2021-10-10 22:21:48 +02:00
Timothy Flynn 3d67f6bd29 LibUnicode: Generate and use unique list-format strings
The list-format strings used for Intl.ListFormat are small, but quite
heavily duplicated. For example, the string "{0}, {1}" appears 6,519
times. Generate unique strings for this data to avoid duplication.
2021-10-10 22:21:48 +02:00
Timothy Flynn f9e605397c LibUnicode: Generate and use a set of unique locale-related strings
In the generated UnicodeLocale.cpp file, there are 296,408 strings for
localizations of languages, territories, scripts, currencies & keywords.
Of these, only 43,848 (14.8%) are actually unique, so there are quite a
large number of duplicated strings.

This generates a single compile-time array to store these strings. The
arrays for the localizations now store an index into this single array
rather than duplicating any strings.
2021-10-10 22:21:48 +02:00
Timothy Flynn 3f0095b57a LibUnicode: Skip unknown languages and territories
Some CLDR languages.json / territories.json files contain localizations
for some lanuages/territories that are otherwise not present in the CLDR
database. We already don't generate anything in UnicodeLocale.cpp for
these anomalies, but this will stop us from even storing that data in
the generator's memory.

This doesn't affect the output of the generator, but will have an effect
after an upcoming commit to unique-ify all of the strings in the CLDR.
2021-10-10 22:21:48 +02:00
Ben Wiederhake 6d99b7b72e Meta: Re-enable warnings for deprecated copies also for Lagom 2021-10-10 21:21:35 +01:00
Timothy Flynn 79707d83d3 LibUnicode: Stop generating large UnicodeData hash map
The data in this hash map is now available by way of much smaller arrays
and is now unused.
2021-10-10 13:49:37 +02:00
Timothy Flynn d83b262e64 LibUnicode: Generate standalone compile-time array for combining class 2021-10-10 13:49:37 +02:00
Timothy Flynn 9f83774913 LibUnicode: Generate standalone compile-time array for special casing
There are only 112 code points with special casing rules, so this array
is quite small (compared to the size 34,626 UnicodeData hash map that is
also storing this data). Removing all casing rules from UnicodeData will
happen in a subsequent commit.
2021-10-10 13:49:37 +02:00
Timothy Flynn da4b8897a7 LibUnicode: Generate standalone compile-time arrays for simple casing
Currently, all casing information (simple and special) are stored in a
compile-time array of size 34,626, then statically copied to a hash map
at runtime. In an effort to reduce the resulting memory usage, store the
simple casing rules in standalone compile-time arrays. The uppercase map
is size 1,450 and the lowercase map is size 1,433. Any code point not in
a map will implicitly have an identity mapping.
2021-10-10 13:49:37 +02:00
Idan Horowitz 9958277317 Meta: Disable -Wmaybe-uninitialized for Lagom 2021-10-07 21:56:03 +03:00
Andreas Kling bf43b0f884 LibWeb: Make IDL-constructed objects aware of their JS wrapper
Having IDL constructors call FooWrapper::create(impl) directly was
creating a wrapper directly without telling the impl object about the
wrapper. This meant that we had wrapped C++ objects with a null
wrapper() pointer.
2021-10-04 12:13:25 +02:00
Linus Groh 2f42675ebd LibJS: Convert ordinary_set_with_own_descriptor() to ThrowCompletionOr 2021-10-04 09:52:15 +01:00
Linus Groh 3be26f56db LibJS: Convert has_own_property() to ThrowCompletionOr 2021-10-03 20:14:03 +01:00
Linus Groh fb443b3fb4 LibJS: Convert create_data_property() to ThrowCompletionOr 2021-10-03 20:14:03 +01:00