Commit graph

452 commits

Author SHA1 Message Date
Martin Janiczek 65df1cd22c AK: Add randomized tests for Bitmap
These tests found the issue fixed in #21409 (but randomized test runner
wasn't merged yet).
2023-11-18 10:01:29 +01:00
Martin Janiczek a00b4a5c1f AK: Add roundtrip tests for BitStream 2023-11-18 10:01:29 +01:00
Martin Janiczek c802971f93 AK: Add liveness tests for BinarySearch 2023-11-18 10:01:29 +01:00
Martin Janiczek 1452693183 AK: Add randomized tests for BinaryHeap 2023-11-18 10:01:29 +01:00
Martin Janiczek 6c1626e83b AK: Add tests for AllOf and AnyOf 2023-11-18 10:01:29 +01:00
Tim Schumacher a2f60911fe AK: Rename GenericTraits to DefaultTraits
This feels like a more fitting name for something that provides the
default values for Traits.
2023-11-09 10:05:51 -05:00
Lucas CHOLLET b00476abac AK: Use an enum to specify the open mode instead of a bool
Let's replace this bool with an `enum class` in order to enhance
readability. This is done by repurposing `MappedFile`'s `OpenMode` into
a shared `enum` simply called `Mode`.
2023-11-08 18:19:34 +01:00
Tim Ledbetter 2a1fc96650 AK: Avoid unnecessary String allocations for URL username and password
Previously, `URLParser` was constructing a new String for every
character of the URL's username and password. This change improves
performance by eliminating those unnecessary String allocations.

A URL with a 100,000 character password can now be parsed in ~30ms vs
~8 seconds previously on my machine.
2023-11-06 09:19:12 +01:00
Dan Klishch b65d281bbb AK: Add GenericLexer::{consume_decimal_integer,peek_string} 2023-11-04 18:06:30 +01:00
Ali Mohammad Pur 78c04cb8b2 AK+LibPDF: Make Format print floats in a roundtrip-safe way by default
Previously we assumed a default precision of 6, which made the printed
values quite odd in some cases.
This commit changes that default to print them with just enough
precision to produce the exact same float when roundtripped.

This commit adds some new tests that assert exact format outputs, which
have to be modified if we decide to change the default behaviour.
2023-10-31 09:12:35 +03:30
Gurkirat Singh f1b79e0cd3 AK: Implement slugify function for URL slug generation
The slugify function is used to convert input into URL-friendly slugs.
It processes each character in the input, keeping ascii alpha characters
after lowercase and replacing non-alphanum characters with the glue
character or a space if multiple spaces are encountered consecutively.
The resulting string is trimmed of leading and trailing whitespace, and
any internal whitespace is replaced with the glue character.

It is currently used in LibMarkdown headings generation code.
2023-10-30 10:39:59 +00:00
Tim Ledbetter e4715aa82a AK: Use correct type when calculating integral exp2()
Previously, integral `exp2()` would produce the incorrect result for
exponents above 31.
2023-10-27 21:59:44 -04:00
Shannon Booth 3748f1d290 AK: Check for overflow parsing IPv4 number in URL
Fixes OSS fuzz issue:
https://oss-fuzz.com/download?testcase_id=6045676088459264
2023-10-26 11:11:41 +02:00
hanaa12G 54e1470467 AK: Pass correct length to StringUtils::convert_to_floating_point()
Fixed the issue in StringUtils::convert_to_floating_point() where the
end pointer of the trimmed string was not being passed, causing the
function to consistently return 'None' when given strings with trailing
whitespaces.
2023-10-22 00:22:29 +02:00
Jelle Raaijmakers b015926f8e AK: Improve floating point decimals formatting
There were 2 issues with the way we formatted floating point decimals:
if the part after the decimal point exceeded the max of an u64 we would
generate wildly incorrect decimals, and we applied no rounding.

With this new code, we emit decimals one by one and perform a simple
reverse string walk to round the number up if required.
2023-10-18 19:39:30 -04:00
Ali Mohammad Pur aeee98b3a1 AK+Everywhere: Remove the null state of DeprecatedString
This commit removes DeprecatedString's "null" state, and replaces all
its users with one of the following:
- A normal, empty DeprecatedString
- Optional<DeprecatedString>

Note that null states of DeprecatedFlyString/StringView/etc are *not*
affected by this commit. However, DeprecatedString::empty() is now
considered equal to a null StringView.
2023-10-13 18:33:21 +03:30
Martin Janiczek efa5fb5c3a AK: Fix one-off error in BitmapView::find_first and find_one_anywhere
The mentioned functions used m_size / 8 instead of size_in_bytes()
(division with ceiling rounding mode), which resulted in an off-by-one
error such that the functions didn't search in the last not-fully-8-bits
byte.

Using size_in_bytes() instead of m_size / 8 fixes this.
2023-10-11 15:58:16 +02:00
Sam Atkins 253a96277e AK: Add FixedMemoryStream methods for reading values "in place"
When working with FixedMemoryStreams, and especially MappedFiles, you
may don't want to copy the underlying data when you read from the
stream. Pointing into that data is perfectly fine as long as you know
the lifetime of it is long enough.

This commit adds a couple of methods for reading either a single value,
or a span of them, in this way. As noted, for single values you sadly
get a raw pointer instead of a reference, but that's the only option
right now.
2023-10-10 14:36:25 +02:00
Tim Ledbetter f10db48bab AK: Avoid overflow when passing i64 minimum value to vformat() 2023-10-09 09:39:02 +03:30
Tim Ledbetter bf75ecdcf7 Tests/AK: Add FuzzyMatch tests 2023-10-06 22:09:18 +02:00
kleines Filmröllchen 9a026fc8d5 AK: Implement SipHash as the default hash algorithm for most use cases
SipHash is highly HashDoS-resistent, initialized with a random seed at
startup (i.e. non-deterministic) and usable for security-critical use
cases with large enough parameters. We just use it because it's
reasonably secure with parameters 1-3 while having excellent properties
and not being significantly slower than before.
2023-10-01 11:06:36 +03:30
kleines Filmröllchen 5b2496e522 AK: Make writability violation of FixedMemoryStream non-fatal
Writing to a read-only file is not a program-crashing error either, so
we just return the standard EBADFD (see write(2)) here.
2023-09-27 03:22:56 +02:00
Daniel Bertalan 4d2af7c3d6 AK: Implement reverse iterators for OrderedHashTable 2023-09-24 23:36:43 +02:00
kleines Filmröllchen 8bcf25561b AK: Fix MemoryStream seek from end
The seek offset is still applied positively when seeking from the end;
see the Kernel's seek implementation.
2023-09-17 17:13:52 -06:00
Sergey Bugaev e97c2538b6 Tests: Fix building TestSIMD on non-SSE ABIs 2023-09-06 07:21:07 -06:00
Sergey Bugaev 7bce0e5d1a Tests: Fix TestDuration build with 32-bit time_t
There are a bunch of tests that check for time_t handling 64-bit values
properly. This makes sense, but also obviously doesn't work when time_t
is 32-bit, and causes compile-time errors. Compile these tests out in
that case.

Since there's no straightforward way to check sizeof(time_t) inside the
proprocessor, only do this when glibc's __TIMESIZE is defined.
2023-09-06 07:21:07 -06:00
ronak69 2caf68fd03 AK: Add binary and octal mode formatting for FixedPoint 2023-08-29 11:10:45 +02:00
ronak69 352480338e AK: Refactor FixedPoint's formatter
The main change is the simplification of the expression
`(10^precision * fraction) / 2^precision` to `5^precision * fraction`.

Those expressions overflow or not depends on the value of `precision`
and `fraction`. For the maximum value of `fraction`, the following table
shows for which value of `precision` overflow will occur.

            Old   New
    u32      08    10
    u64      15    20
    u128     30    39

As of now `u64` type is used to calculate the result of the expression.
Meaning that before, only FixedPoints with `precision` less than 15
could be accurately rendered (for every value of fraction) in decimal.
Now, this limit gets increased to 20.

This refactor also fixes, broken decimal render for explicitly specified
precision width in format string, and broken hexadecimal render.
2023-08-29 11:10:45 +02:00
ronak69 4c6ea4a963 AK: Fix off-by-one error in round-to-even logic of FixedPoint
Because of the off-by-one error, the second bit of the fraction was
getting ignored in differentiating between fractions equal to 0.5 or
greater than 0.5. This resulted in numbers like 2.75 being considered
as having fraction equal to 0.5 and getting rounded incorrectly (to 2).
2023-08-29 11:10:45 +02:00
Andreas Kling b256e52586 AK: Make Formatter for NonnullOwnPtr<T> format the T
This mirrors the behavior of NonnullRefPtr<T>. If you want to format
the pointer address, call .ptr() on it.
2023-08-25 20:10:47 +02:00
Karol Kosek e575ee4462 AK+Kernel: Unify Traits<T>::equals()'s argument order on different types
There was a small mishmash of argument order, as seen on the table:

                 | Traits<T>::equals(U, T) | Traits<T>::equals(T, U)
   ============= | ======================= | =======================
   uses equals() | HashMap                 | Vector, HashTable
defines equals() | *String[^1]             | ByteBuffer

[^1]: String, DeprecatedString, their Fly-type equivalents and KString.

This mostly meant that you couldn't use a StringView for finding a value
in Vector<String>.

I'm changing the order of arguments to make the trait type itself first
(`Traits<T>::equals(T, U)`), as I think it's more expected and makes us
more consistent with the rest of the functions that put the stored type
first (like StringUtils functions and binary_serach). I've also renamed
the variable name "other" in find functions to "entry" to give more
importance to the value.

With this change, each of the following lines will now compile
successfully:

    Vector<String>().contains_slow("WHF!"sv);
    HashTable<String>().contains("WHF!"sv);
    HashMap<ByteBuffer, int>().contains("WHF!"sv.bytes());
2023-08-23 20:21:09 +02:00
Andreas Kling 0b83717ea2 AK: Make SourceGenerator::fork() infallible 2023-08-22 13:08:24 +02:00
Ali Mohammad Pur 4f0f1c7c72 AK: Add support for Little/BigEndian<UFixedBigInteger<M>> 2023-08-21 13:39:32 +03:30
Hendiadyoin1 394529b7d0 AK: Fix formatting of negative whole fixed point numbers
Instead of `-2` we were printing `-2.1`

Co-Authored-By: Daniel Bertalan <dani@danielbertalan.dev>
2023-08-14 14:20:45 -06:00
Hendiadyoin1 daacc5c6c2 AK: Rename AK::FixedPoint::round to rint and fix a rounding error
`rint` is a more accurate name for the roudning mode as the fixme above
stated
2023-08-14 14:20:45 -06:00
Hendiadyoin1 05c959e40b AK: Change standard casting method of FixedPoint to truncation
This matches what floats do.

Also fix typo `trunk`->`trunc`
2023-08-14 14:20:45 -06:00
Hendiadyoin1 e609ac74a3 AK: Fix FixedPoint multiplication rounding and overflow behaviour
We now preform the multiplication in a widened type which makes it
overflow-safe and use the correct bit for rounding direction detection.
2023-08-14 14:20:45 -06:00
Shannon Booth 9d60f23abc AK: Port URL::m_fragment from DeprecatedString to String 2023-08-13 15:03:53 -06:00
Shannon Booth 21fe86d235 AK: Port URL::m_query from DeprecatedString to String 2023-08-13 15:03:53 -06:00
Lucas CHOLLET fde26c53f0 AK: Remove the API to explicitly construct short strings
Now that ""_string is infallible, the only benefit of explicitly
constructing a short string is the ability to do it at compile-time. But
we never do that, so let's simplify the API and remove this
implementation detail from it.
2023-08-08 07:37:21 +02:00
Lucas CHOLLET 3f35ffb648 Userland: Prefer _string over _short_string
As `_string` can't fail anymore (since 3434412), there are no real
benefits to use the short variant in most cases.
2023-08-08 07:37:21 +02:00
Andreas Kling 25eee91811 AK: Make "foo"_fly_string infallible
Stop worrying about tiny OOMs.

Work towards #20405.
2023-08-07 16:03:27 +02:00
Andreas Kling 34344120f2 AK: Make "foo"_string infallible
Stop worrying about tiny OOMs.

Work towards #20405.
2023-08-07 16:03:27 +02:00
Karol Kosek eb41f0144b AK: Decode data URLs to separate class (and parse like every other URL)
Parsing 'data:' URLs took it's own route. It never set standard URL
fields like path, query or fragment (except for scheme) and instead
gave us separate methods called `data_payload()`, `data_mime_type()`,
and `data_payload_is_base64()`.

Because parsing 'data:' didn't use standard fields, running the
following JS code:

    new URL('#a', 'data:text/plain,hello').toString()

not only cleared the path as URLParser doesn't check for data from
data_payload() function (making the result be 'data:#a'), but it also
crashes the program because we forbid having an empty MIME type when we
serialize to string.

With this change, 'data:' URLs will be parsed like every other URLs.
To decode the 'data:' URL contents, one needs to call process_data_url()
on a URL, which will return a struct containing MIME type with already
decoded data! :^)
2023-08-01 14:19:05 +02:00
Karol Kosek 58017a0581 AK: Clear buffer after leaving CannotBeABaseUrlPath in URLParser
By not clearing the buffer, we were leaking the path part of a URL into
the query for URLs without an authority component (no '//host').

This could be seen most noticeably in mailto: URLs with header fields
set, as the query part of `mailto:user@example.com?subject=test` was
parsed to `user@example.comsubject=test`.

data: URLs didn't have this problem, because we have a special case for
parsing them.
2023-08-01 10:10:07 +02:00
Shannon Booth 8751be09f9 AK: Serialize URL hosts with 'concept-host-serializer'
In order to follow spec text to achieve this, we need to change the
underlying representation of a host in AK::URL to deserialized format.
Before this, we were parsing the host and then immediately serializing
it again.

Making that change resulted in a whole bunch of fallout.

After this change, callers can access the serialized data through
this concept-host-serializer. The functional end result of this
change is that IPv6 hosts are now correctly serialized to be
surrounded with '[' and ']'.
2023-07-31 05:18:51 +02:00
Shannon Booth bf7af25a82 AK: Allow testing Empty instances for equality
This also makes it possible to compare `Variant<Empty, Ts...>`
objects if operator== exists for all Ts
2023-07-28 20:47:48 +03:30
kleines Filmröllchen a0705202ea Kernel/Ext2: Write superblock backups
We don't ever read them out, but this should make fsck a lot less mad.
2023-07-28 14:51:07 +02:00
Shannon Booth 177b04dcfc AK: Fix url host parsing check for 'ends in a number'
I misunderstood the spec step for checking whether the host 'ends with a
number'. We can't simply check for it if ends with a number, this check
is actually an algorithm which is required to avoid detecting hosts that
end with a number from an IPv4 host.

Implement this missing step, and add a test to cover this.
2023-07-25 06:43:50 -04:00
Shannon Booth 8d2ccf0f4f AK: Implement IPV4 host URL parsing to specification
This implements both the parsing and serialization IPV4 parts from
the URL spec.
2023-07-24 17:07:16 -04:00
Andreas Kling f0ec104131 AK: Implement IPv6 host parsing in URLParser
This is just a straight (and fairly inefficient) implementation of IPv6
parsing and serialization from the URL spec.

Note that we don't use AK::IPv6Address here because the URL spec
requires a specific serialization behavior.
2023-07-17 07:47:58 +02:00
Shannon Booth 5625ca5cb9 AK: Rename URLParser::parse to URLParser::basic_parse
To make it more clear that this function implements
'concept-basic-url-parser' instead of 'concept-url-parser'.
2023-07-15 09:45:16 +02:00
Timothy Flynn c911781c21 Everywhere: Remove needless trailing semi-colons after functions
This is a new option in clang-format-16.
2023-07-08 10:32:56 +01:00
Tim Schumacher 60ac254df6 AK: Use hashing to accelerate searching a CircularBuffer 2023-07-06 15:06:20 +01:00
Tim Schumacher 42d01b21d8 AK: Rewrite the hint-based CircularBuffer::find_copy_in_seekback
This now searches the memory in blocks, which should be slightly more
efficient. However, it doesn't make much difference (e.g. ~1% in LZMA
compression) in most real-world applications, as the non-hint function
is more expensive by orders of magnitude.
2023-07-06 15:06:20 +01:00
Tim Schumacher 046a9faeb3 AK: Split up CircularBuffer::find_copy_in_seekback
The "operation modes" of this function have very different focuses, and
trying to combine both in a way where we share the most amount of code
probably results in the worst performance.

Instead, split up the function into "existing distances" and "no
existing distances" so that we can optimize either case separately.
2023-07-06 15:06:20 +01:00
Tim Schumacher 9e82ad758e AK: Move parts for searching CircularBuffer into a new class
We will be adding extra logic to the CircularBuffer to optimize
searching, but this would negatively impact the performance of
CircularBuffer users that don't need that functionality.
2023-07-06 15:06:20 +01:00
Valtteri Koskivuori 5e5493e334 AK: Add URLParser relative file URL test
I was debugging a different issue in Ladybird, and noticed that
completing relative file URLs with URL::complete_url didn't seem to work
right. This test case covers both the working https case, as well as the
file URL case fixed by the previous commit.
2023-06-18 15:16:08 +02:00
Ben Wiederhake 2b2e0a4c5a Tests: Use AK_MAKE_DEFAULT_MOVABLE to avoid mistakes in default impls 2023-06-18 08:47:51 +01:00
Sam Atkins 8e53e5afc4 AK: Propagate errors from SourceGenerator::fork() 2023-06-17 17:48:06 +01:00
Peter Brottveit Bock 49b29332f2 AK: Migrate IPv6Address::to_deprecated_string() to ::to_string()
Change the name and return type of
`IPv6Address::to_deprecated_string()` to `IPv6Address::to_string()`
with return type `ErrorOr<String>`.

It will now propagate errors that occur when writing to the
StringBuilder.

There are two users of `to_deprecated_string()` that now use
`to_string()`:

1. `Formatted<IPv6Address>`: it now propagates errors.

2. `inet_ntop`: it now sets errno to ENOMEM and returns.
2023-06-09 19:38:14 +01:00
Peter Brottveit Bock 43a8a38f57 AK: Test that IPv6Address::operator== only looks at bytes in address
This commit extends the test to ensure that different objects with
the same inet6-address are considered equal.
2023-06-09 19:38:14 +01:00
Tim Schumacher 58b1d9c319 AK: Correctly calculate size of the last AllocatingMemoryStream chunk 2023-05-29 13:30:46 +02:00
Tim Schumacher 52aab50914 AK: Handle empty trailing chunks in AllocatingMemoryStream::offset_of 2023-05-29 13:30:46 +02:00
Tim Schumacher 9a7ae52b31 AK: Expose AllocatingMemoryStream::CHUNK_SIZE
This allows the tests to use that information confidently.
2023-05-29 13:30:46 +02:00
Ben Wiederhake 5fafd82927 AK+Everywhere: Don't crash on invalid months
Sadly, we don't have proper error propagation here. However, crashing
the Kernel just because a CDROM contains an invalid month seems like a
bad idea.
2023-05-27 12:17:50 +02:00
Ben Wiederhake 9d40ecacb5 AK: Fix signed overflow in unix time parts parsing 2023-05-27 12:17:50 +02:00
Ben Wiederhake 815ea06d2c AK: Test from_unix_time_parts intensively 2023-05-27 12:17:50 +02:00
kleines Filmröllchen 213025f210 AK: Rename Time to Duration
That's what this class really is; in fact that's what the first line of
the comment says it is.

This commit does not rename the main files, since those will contain
other time-related classes in a little bit.
2023-05-24 23:18:07 +02:00
Ben Wiederhake 6421899078 AK: Rewrite HashMap::clone signature with template-args and const 2023-05-19 22:33:57 +02:00
Tim Schumacher 221b91ff61 AK: Add CircularBuffer::find_copy_in_seekback()
This is useful for compressors, which quite frequently need to find a
matching span of data within the seekback.
2023-05-17 09:08:53 +02:00
Tim Schumacher d194011570 AK: Add count_required_bits 2023-05-17 09:08:53 +02:00
Ben Wiederhake f890b70eae Tests: Prefer TRY_OR_FAIL() and MUST() over EXPECT(!.is_error())
Note that in some cases (in particular SQL::Result and PDFErrorOr),
there is no Formatter defined for the error type, hence TRY_OR_FAIL
cannot work as-is. Furthermore, this commit leaves untouched the places
where MUST could be replaced by TRY_OR_FAIL.

Inspired by:
https://github.com/SerenityOS/serenity/pull/18710#discussion_r1186892445
2023-05-14 15:39:38 -06:00
Kemal Zebari 5ba5beb50f Tests: Add more tests for JsonArray
At an attempt to detect future regressions in AK/JsonArray, this
snapshot adds additional tests for it to TestJSON.cpp.
2023-05-08 07:39:49 +01:00
Ben Wiederhake 36ff6187f6 Everywhere: Change spelling of 'behaviour' to 'behavior'
"The official project language is American English […]."
5d2e915623/CONTRIBUTING.md?plain=1#L30

Here's a short statistic of the occurrences of the word "behavio(u)r":

$ git grep -IPioh 'behaviou?r' | sort | uniq -c | sort -n
      2 BEHAVIOR
     24 Behaviour
     32 behaviour
    407 Behavior
    992 behavior

Therefore, it is clear that "behaviour" (56 occurrences) should be
regarded a typo, and "behavior" (1401 occurrences) should be preferred.

Note that The occurrences in LibJS are intentionally NOT changed,
because there are taken verbatim from the specification. Hence:

$ git grep -IPioh 'behaviou?r' | sort | uniq -c | sort -n
      2 BEHAVIOR
     10 behaviour
     24 Behaviour
    407 Behavior
   1014 behavior
2023-05-07 01:05:09 +02:00
Ben Wiederhake ee47c0275e Everywhere: Run spellcheck on all documentation 2023-05-07 01:05:09 +02:00
Daniel Bertalan 00b4976f2c Everywhere: Make Lagom build with GCC 13
GCC 13 was released on 2023-04-26. This commit fixes Lagom build errors
when using an updated host toolchain:
- Adds a workaround for a bug in constraint handling, which made LibJS
  fail to compile: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109683
- Silences the new `-Wdangling-reference` diagnostic globally. It
  produces multiple false positives with no clear way to silence them
  without `#pragmas`.
- Silences `-Wself-move` in `RefPtr` tests as GCC 13 adds this
  previously Clang-exclusive warning.
2023-05-02 07:03:57 -04:00
Dan Klishch a179383dcc AK: Add benchmarks for floating point parsing 2023-04-30 06:05:54 +02:00
Aliaksandr Kalenik 4c6564e3c1 AK: Add values() method in HashTable
Add HashTable::values() method that returns all values.
2023-04-28 18:11:44 +02:00
Sam Atkins 892470a912 AK: Add Array::contains_slow() and ::first_index_of(), with tests :^) 2023-04-21 20:44:47 +01:00
Andreas Kling b7e847e58b AK: Fix crash during teardown of self-owning objects
We now null out smart pointers *before* calling unref on the pointee.
This ensures that the same smart pointer can't be used to acquire a new
reference to the pointee after its destruction has begun.

I ran into this when destroying a non-empty IntrusiveList of RefPtrs,
but the problem was more general so this fixes it for all of RefPtr,
NonnullRefPtr, OwnPtr and NonnullOwnPtr.
2023-04-21 18:15:00 +02:00
MacDue 5db1eb9961 AK+Everywhere: Replace URL::paths() with path_segment_at_index()
This allows accessing and looping over the path segments in a URL
without necessarily allocating a new vector if you want them percent
decoded too (which path_segment_at_index() has an option for).
2023-04-15 06:37:04 +02:00
MacDue 35612c6a7f AK+Everywhere: Change URL::path() to serialize_path()
This now defaults to serializing the path with percent decoded segments
(which is what all callers expect), but has an option not to. This fixes
`file://` URLs with spaces in their paths.

The name has been changed to serialize_path() path to make it more clear
that this method will generate a new string each call (except for the
cannot_be_a_base_url() case). A few callers have then been updated to
avoid repeatedly calling this function.
2023-04-15 06:37:04 +02:00
Tim Schumacher b1136ba357 AK: Efficiently resize CircularBuffer seekback copy distance
Previously, if we copied the last byte for a length of 100, we'd
recalculate the read span 100 times and memmove one byte 100 times,
which resulted in a lot of overhead.

Now, if we know that we have two consecutive copies of the data, we just
extend the distance to cover both copies, which halves the number of
times that we recalculate the span and actually call memmove.

This takes the running time of the attached benchmark case from 150ms
down to 15ms.
2023-04-14 10:03:42 +02:00
Sam Atkins 89e55c5297 AK+Tests: Add Vector::find_first_index_if() 2023-04-13 09:53:47 +02:00
MacDue 8283e8b88c AK: Don't store parts of URLs percent decoded
As noted in serval comments doing this goes against the WC3 spec,
and breaks parsing then re-serializing URLs that contain percent
encoded data, that was not encoded using the same character set as
the serializer.

For example, previously if you had a URL like:

https:://foo.com/what%2F%2F (the path is what + '//' percent encoded)

Creating URL("https:://foo.com/what%2F%2F").serialize() would return:

https://foo.com/what//

Which is incorrect and not the same as the URL we passed. This is
because the re-serializing uses the PercentEncodeSet::Path which
does not include '/'.

Only doing the percent encoding in the setters fixes this, which
is required to navigate to Google Street View (which includes a
percent encoded URL in its URL).

Seems to fix #13477 too
2023-04-12 07:40:22 +02:00
networkException 9915fa72fb AK+Everywhere: Use Optional for URLParser::parse's base_url parameter 2023-04-11 16:28:20 +02:00
Tim Ledbetter 72ea046b68 AK: Add option to the string formatter to use a digit separator
`vformat()` can now accept format specifiers of the form
{:'[numeric-type]}. This will output a number with a comma separator
every 3 digits.

For example:

`dbgln("{:'d}", 9999999);` will output 9,999,999.

Binary, octal and hexadecimal numbers can also use this feature, for
example:

`dbgln("{:'x}", 0xffffffff);` will output ff,fff,fff.
2023-04-11 13:03:30 +02:00
Kenneth Myhra d6cf9f5329 AK: Add FlyString::is_one_of for variadic string comparison 2023-04-06 23:49:08 +02:00
Timothy Flynn eed956b473 AK: Increase LittleEndianOutputBitStream's buffer size and remove loops
This is very similar to the LittleEndianInputBitStream bit buffer change
from 8e834d4bb2.

We currently buffer one byte of data for the underlying stream. And when
we put bits onto that buffer, we do so 1 bit at a time.

This replaces the u8 buffer with a u64. And instead of looping at all,
we perform bitwise operations to write the desired number of bits.

Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), compression time
decreases from:

    13.62s to 10.9s on Serenity (cold)
    13.62s to 9.22s on Serenity (warm)
    2.93s to 2.32s on Linux

One caveat is that this requires explicitly flushing any leftover bits
when the caller is done with the stream. The byte buffer implementation
implicitly flushed its data every time the buffer was byte-aligned, as
doing so would always fill the byte. This is no longer the case. But for
now, this should be fine as the one user of this class, DEFLATE, already
has a "flush everything now that we're done" finalizer.
2023-04-02 10:54:37 +02:00
Jelle Raaijmakers 954d660094 AK: Clear OrderedHashTable previous/next pointers on removal
With Clang, the previous/next pointers in buckets of an
`OrderedHashTable` are not cleared when a bucket is being shifted up as
a result of a removed bucket. As a result, an unfortunate pointer mixup
could lead to an infinite loop in the `HashTable` iterator, which was
exposed in `HashMap::keys()`.

Co-authored-by: Luke Wilde <lukew@serenityos.org>
2023-03-15 21:43:52 +01:00
gustrb 5141c86587 AK: Rename CaseInsensitiveStringViewTraits to reflect intent
Now it is called `CaseInsensitiveASCIIStringViewTraits`, so we can be
more specific about what data structure does it operate onto. ;)
2023-03-14 21:34:32 +00:00
Tim Schumacher ae51c1821c Everywhere: Remove unintentional partial stream reads and writes 2023-03-13 15:16:20 +00:00
Tim Schumacher ecd1862859 AK: Rename Stream::write_entire_buffer to Stream::write_until_depleted
No functional changes.
2023-03-13 15:16:20 +00:00
Tim Schumacher d5871f5717 AK: Rename Stream::{read,write} to Stream::{read_some,write_some}
Similar to POSIX read, the basic read and write functions of AK::Stream
do not have a lower limit of how much data they read or write (apart
from "none at all").

Rename the functions to "read some [data]" and "write some [data]" (with
"data" being omitted, since everything here is reading and writing data)
to make them sufficiently distinct from the functions that ensure to
use the entire buffer (which should be the go-to function for most
usages).

No functional changes, just a lot of new FIXMEs.
2023-03-13 15:16:20 +00:00
Timothy Flynn 1393ed2000 AK+LibUnicode: Implement String::equals_ignoring_case without allocating
We currently fully casefold the left- and right-hand sides to compare
two strings with case-insensitivity. Now, we casefold one code point at
a time, storing the result in a view for comparison, until we exhaust
both strings.
2023-03-08 18:57:53 +00:00
Timothy Flynn 515fca4f7a AK: Make String::contains(code_point) handle non-ASCII
We currently only accept a char, instead of a full code point.
2023-03-08 14:16:47 +00:00
Timothy Flynn f882581e91 AK: Make String::{starts,ends}_with(code_point) handle non-ASCII
We currently pass the code point to StringView::{starts,ends}_with,
which actually accepts a single char, thus cannot handle non-ASCII
code points.
2023-03-08 14:16:47 +00:00
Andreas Kling 21db2b7b90 Everywhere: Remove NonnullOwnPtr.h includes 2023-03-06 23:46:35 +01:00
Andreas Kling 359d6e7b0b Everywhere: Stop using NonnullOwnPtrVector
Same as NonnullRefPtrVector: weird semantics, questionable benefits.
2023-03-06 23:46:35 +01:00