Commit graph

452 commits

Author SHA1 Message Date
Andreas Kling f0ec104131 AK: Implement IPv6 host parsing in URLParser
This is just a straight (and fairly inefficient) implementation of IPv6
parsing and serialization from the URL spec.

Note that we don't use AK::IPv6Address here because the URL spec
requires a specific serialization behavior.
2023-07-17 07:47:58 +02:00
Shannon Booth 5625ca5cb9 AK: Rename URLParser::parse to URLParser::basic_parse
To make it more clear that this function implements
'concept-basic-url-parser' instead of 'concept-url-parser'.
2023-07-15 09:45:16 +02:00
Timothy Flynn c911781c21 Everywhere: Remove needless trailing semi-colons after functions
This is a new option in clang-format-16.
2023-07-08 10:32:56 +01:00
Tim Schumacher 60ac254df6 AK: Use hashing to accelerate searching a CircularBuffer 2023-07-06 15:06:20 +01:00
Tim Schumacher 42d01b21d8 AK: Rewrite the hint-based CircularBuffer::find_copy_in_seekback
This now searches the memory in blocks, which should be slightly more
efficient. However, it doesn't make much difference (e.g. ~1% in LZMA
compression) in most real-world applications, as the non-hint function
is more expensive by orders of magnitude.
2023-07-06 15:06:20 +01:00
Tim Schumacher 046a9faeb3 AK: Split up CircularBuffer::find_copy_in_seekback
The "operation modes" of this function have very different focuses, and
trying to combine both in a way where we share the most amount of code
probably results in the worst performance.

Instead, split up the function into "existing distances" and "no
existing distances" so that we can optimize either case separately.
2023-07-06 15:06:20 +01:00
Tim Schumacher 9e82ad758e AK: Move parts for searching CircularBuffer into a new class
We will be adding extra logic to the CircularBuffer to optimize
searching, but this would negatively impact the performance of
CircularBuffer users that don't need that functionality.
2023-07-06 15:06:20 +01:00
Valtteri Koskivuori 5e5493e334 AK: Add URLParser relative file URL test
I was debugging a different issue in Ladybird, and noticed that
completing relative file URLs with URL::complete_url didn't seem to work
right. This test case covers both the working https case, as well as the
file URL case fixed by the previous commit.
2023-06-18 15:16:08 +02:00
Ben Wiederhake 2b2e0a4c5a Tests: Use AK_MAKE_DEFAULT_MOVABLE to avoid mistakes in default impls 2023-06-18 08:47:51 +01:00
Sam Atkins 8e53e5afc4 AK: Propagate errors from SourceGenerator::fork() 2023-06-17 17:48:06 +01:00
Peter Brottveit Bock 49b29332f2 AK: Migrate IPv6Address::to_deprecated_string() to ::to_string()
Change the name and return type of
`IPv6Address::to_deprecated_string()` to `IPv6Address::to_string()`
with return type `ErrorOr<String>`.

It will now propagate errors that occur when writing to the
StringBuilder.

There are two users of `to_deprecated_string()` that now use
`to_string()`:

1. `Formatted<IPv6Address>`: it now propagates errors.

2. `inet_ntop`: it now sets errno to ENOMEM and returns.
2023-06-09 19:38:14 +01:00
Peter Brottveit Bock 43a8a38f57 AK: Test that IPv6Address::operator== only looks at bytes in address
This commit extends the test to ensure that different objects with
the same inet6-address are considered equal.
2023-06-09 19:38:14 +01:00
Tim Schumacher 58b1d9c319 AK: Correctly calculate size of the last AllocatingMemoryStream chunk 2023-05-29 13:30:46 +02:00
Tim Schumacher 52aab50914 AK: Handle empty trailing chunks in AllocatingMemoryStream::offset_of 2023-05-29 13:30:46 +02:00
Tim Schumacher 9a7ae52b31 AK: Expose AllocatingMemoryStream::CHUNK_SIZE
This allows the tests to use that information confidently.
2023-05-29 13:30:46 +02:00
Ben Wiederhake 5fafd82927 AK+Everywhere: Don't crash on invalid months
Sadly, we don't have proper error propagation here. However, crashing
the Kernel just because a CDROM contains an invalid month seems like a
bad idea.
2023-05-27 12:17:50 +02:00
Ben Wiederhake 9d40ecacb5 AK: Fix signed overflow in unix time parts parsing 2023-05-27 12:17:50 +02:00
Ben Wiederhake 815ea06d2c AK: Test from_unix_time_parts intensively 2023-05-27 12:17:50 +02:00
kleines Filmröllchen 213025f210 AK: Rename Time to Duration
That's what this class really is; in fact that's what the first line of
the comment says it is.

This commit does not rename the main files, since those will contain
other time-related classes in a little bit.
2023-05-24 23:18:07 +02:00
Ben Wiederhake 6421899078 AK: Rewrite HashMap::clone signature with template-args and const 2023-05-19 22:33:57 +02:00
Tim Schumacher 221b91ff61 AK: Add CircularBuffer::find_copy_in_seekback()
This is useful for compressors, which quite frequently need to find a
matching span of data within the seekback.
2023-05-17 09:08:53 +02:00
Tim Schumacher d194011570 AK: Add count_required_bits 2023-05-17 09:08:53 +02:00
Ben Wiederhake f890b70eae Tests: Prefer TRY_OR_FAIL() and MUST() over EXPECT(!.is_error())
Note that in some cases (in particular SQL::Result and PDFErrorOr),
there is no Formatter defined for the error type, hence TRY_OR_FAIL
cannot work as-is. Furthermore, this commit leaves untouched the places
where MUST could be replaced by TRY_OR_FAIL.

Inspired by:
https://github.com/SerenityOS/serenity/pull/18710#discussion_r1186892445
2023-05-14 15:39:38 -06:00
Kemal Zebari 5ba5beb50f Tests: Add more tests for JsonArray
At an attempt to detect future regressions in AK/JsonArray, this
snapshot adds additional tests for it to TestJSON.cpp.
2023-05-08 07:39:49 +01:00
Ben Wiederhake 36ff6187f6 Everywhere: Change spelling of 'behaviour' to 'behavior'
"The official project language is American English […]."
5d2e915623/CONTRIBUTING.md?plain=1#L30

Here's a short statistic of the occurrences of the word "behavio(u)r":

$ git grep -IPioh 'behaviou?r' | sort | uniq -c | sort -n
      2 BEHAVIOR
     24 Behaviour
     32 behaviour
    407 Behavior
    992 behavior

Therefore, it is clear that "behaviour" (56 occurrences) should be
regarded a typo, and "behavior" (1401 occurrences) should be preferred.

Note that The occurrences in LibJS are intentionally NOT changed,
because there are taken verbatim from the specification. Hence:

$ git grep -IPioh 'behaviou?r' | sort | uniq -c | sort -n
      2 BEHAVIOR
     10 behaviour
     24 Behaviour
    407 Behavior
   1014 behavior
2023-05-07 01:05:09 +02:00
Ben Wiederhake ee47c0275e Everywhere: Run spellcheck on all documentation 2023-05-07 01:05:09 +02:00
Daniel Bertalan 00b4976f2c Everywhere: Make Lagom build with GCC 13
GCC 13 was released on 2023-04-26. This commit fixes Lagom build errors
when using an updated host toolchain:
- Adds a workaround for a bug in constraint handling, which made LibJS
  fail to compile: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109683
- Silences the new `-Wdangling-reference` diagnostic globally. It
  produces multiple false positives with no clear way to silence them
  without `#pragmas`.
- Silences `-Wself-move` in `RefPtr` tests as GCC 13 adds this
  previously Clang-exclusive warning.
2023-05-02 07:03:57 -04:00
Dan Klishch a179383dcc AK: Add benchmarks for floating point parsing 2023-04-30 06:05:54 +02:00
Aliaksandr Kalenik 4c6564e3c1 AK: Add values() method in HashTable
Add HashTable::values() method that returns all values.
2023-04-28 18:11:44 +02:00
Sam Atkins 892470a912 AK: Add Array::contains_slow() and ::first_index_of(), with tests :^) 2023-04-21 20:44:47 +01:00
Andreas Kling b7e847e58b AK: Fix crash during teardown of self-owning objects
We now null out smart pointers *before* calling unref on the pointee.
This ensures that the same smart pointer can't be used to acquire a new
reference to the pointee after its destruction has begun.

I ran into this when destroying a non-empty IntrusiveList of RefPtrs,
but the problem was more general so this fixes it for all of RefPtr,
NonnullRefPtr, OwnPtr and NonnullOwnPtr.
2023-04-21 18:15:00 +02:00
MacDue 5db1eb9961 AK+Everywhere: Replace URL::paths() with path_segment_at_index()
This allows accessing and looping over the path segments in a URL
without necessarily allocating a new vector if you want them percent
decoded too (which path_segment_at_index() has an option for).
2023-04-15 06:37:04 +02:00
MacDue 35612c6a7f AK+Everywhere: Change URL::path() to serialize_path()
This now defaults to serializing the path with percent decoded segments
(which is what all callers expect), but has an option not to. This fixes
`file://` URLs with spaces in their paths.

The name has been changed to serialize_path() path to make it more clear
that this method will generate a new string each call (except for the
cannot_be_a_base_url() case). A few callers have then been updated to
avoid repeatedly calling this function.
2023-04-15 06:37:04 +02:00
Tim Schumacher b1136ba357 AK: Efficiently resize CircularBuffer seekback copy distance
Previously, if we copied the last byte for a length of 100, we'd
recalculate the read span 100 times and memmove one byte 100 times,
which resulted in a lot of overhead.

Now, if we know that we have two consecutive copies of the data, we just
extend the distance to cover both copies, which halves the number of
times that we recalculate the span and actually call memmove.

This takes the running time of the attached benchmark case from 150ms
down to 15ms.
2023-04-14 10:03:42 +02:00
Sam Atkins 89e55c5297 AK+Tests: Add Vector::find_first_index_if() 2023-04-13 09:53:47 +02:00
MacDue 8283e8b88c AK: Don't store parts of URLs percent decoded
As noted in serval comments doing this goes against the WC3 spec,
and breaks parsing then re-serializing URLs that contain percent
encoded data, that was not encoded using the same character set as
the serializer.

For example, previously if you had a URL like:

https:://foo.com/what%2F%2F (the path is what + '//' percent encoded)

Creating URL("https:://foo.com/what%2F%2F").serialize() would return:

https://foo.com/what//

Which is incorrect and not the same as the URL we passed. This is
because the re-serializing uses the PercentEncodeSet::Path which
does not include '/'.

Only doing the percent encoding in the setters fixes this, which
is required to navigate to Google Street View (which includes a
percent encoded URL in its URL).

Seems to fix #13477 too
2023-04-12 07:40:22 +02:00
networkException 9915fa72fb AK+Everywhere: Use Optional for URLParser::parse's base_url parameter 2023-04-11 16:28:20 +02:00
Tim Ledbetter 72ea046b68 AK: Add option to the string formatter to use a digit separator
`vformat()` can now accept format specifiers of the form
{:'[numeric-type]}. This will output a number with a comma separator
every 3 digits.

For example:

`dbgln("{:'d}", 9999999);` will output 9,999,999.

Binary, octal and hexadecimal numbers can also use this feature, for
example:

`dbgln("{:'x}", 0xffffffff);` will output ff,fff,fff.
2023-04-11 13:03:30 +02:00
Kenneth Myhra d6cf9f5329 AK: Add FlyString::is_one_of for variadic string comparison 2023-04-06 23:49:08 +02:00
Timothy Flynn eed956b473 AK: Increase LittleEndianOutputBitStream's buffer size and remove loops
This is very similar to the LittleEndianInputBitStream bit buffer change
from 8e834d4bb2.

We currently buffer one byte of data for the underlying stream. And when
we put bits onto that buffer, we do so 1 bit at a time.

This replaces the u8 buffer with a u64. And instead of looping at all,
we perform bitwise operations to write the desired number of bits.

Using the "enwik8" file as a test (100MB uncompressed, commonly used in
benchmarks: https://www.mattmahoney.net/dc/enwik8.zip), compression time
decreases from:

    13.62s to 10.9s on Serenity (cold)
    13.62s to 9.22s on Serenity (warm)
    2.93s to 2.32s on Linux

One caveat is that this requires explicitly flushing any leftover bits
when the caller is done with the stream. The byte buffer implementation
implicitly flushed its data every time the buffer was byte-aligned, as
doing so would always fill the byte. This is no longer the case. But for
now, this should be fine as the one user of this class, DEFLATE, already
has a "flush everything now that we're done" finalizer.
2023-04-02 10:54:37 +02:00
Jelle Raaijmakers 954d660094 AK: Clear OrderedHashTable previous/next pointers on removal
With Clang, the previous/next pointers in buckets of an
`OrderedHashTable` are not cleared when a bucket is being shifted up as
a result of a removed bucket. As a result, an unfortunate pointer mixup
could lead to an infinite loop in the `HashTable` iterator, which was
exposed in `HashMap::keys()`.

Co-authored-by: Luke Wilde <lukew@serenityos.org>
2023-03-15 21:43:52 +01:00
gustrb 5141c86587 AK: Rename CaseInsensitiveStringViewTraits to reflect intent
Now it is called `CaseInsensitiveASCIIStringViewTraits`, so we can be
more specific about what data structure does it operate onto. ;)
2023-03-14 21:34:32 +00:00
Tim Schumacher ae51c1821c Everywhere: Remove unintentional partial stream reads and writes 2023-03-13 15:16:20 +00:00
Tim Schumacher ecd1862859 AK: Rename Stream::write_entire_buffer to Stream::write_until_depleted
No functional changes.
2023-03-13 15:16:20 +00:00
Tim Schumacher d5871f5717 AK: Rename Stream::{read,write} to Stream::{read_some,write_some}
Similar to POSIX read, the basic read and write functions of AK::Stream
do not have a lower limit of how much data they read or write (apart
from "none at all").

Rename the functions to "read some [data]" and "write some [data]" (with
"data" being omitted, since everything here is reading and writing data)
to make them sufficiently distinct from the functions that ensure to
use the entire buffer (which should be the go-to function for most
usages).

No functional changes, just a lot of new FIXMEs.
2023-03-13 15:16:20 +00:00
Timothy Flynn 1393ed2000 AK+LibUnicode: Implement String::equals_ignoring_case without allocating
We currently fully casefold the left- and right-hand sides to compare
two strings with case-insensitivity. Now, we casefold one code point at
a time, storing the result in a view for comparison, until we exhaust
both strings.
2023-03-08 18:57:53 +00:00
Timothy Flynn 515fca4f7a AK: Make String::contains(code_point) handle non-ASCII
We currently only accept a char, instead of a full code point.
2023-03-08 14:16:47 +00:00
Timothy Flynn f882581e91 AK: Make String::{starts,ends}_with(code_point) handle non-ASCII
We currently pass the code point to StringView::{starts,ends}_with,
which actually accepts a single char, thus cannot handle non-ASCII
code points.
2023-03-08 14:16:47 +00:00
Andreas Kling 21db2b7b90 Everywhere: Remove NonnullOwnPtr.h includes 2023-03-06 23:46:35 +01:00
Andreas Kling 359d6e7b0b Everywhere: Stop using NonnullOwnPtrVector
Same as NonnullRefPtrVector: weird semantics, questionable benefits.
2023-03-06 23:46:35 +01:00
Dan Klishch 2d27c98659 AK: Implement Knuth's algorithm D for dividing UFixedBigInt's 2023-03-04 22:10:03 -07:00
Dan Klishch 2470fab05e AK: Delete unused and untested sqrt, pow and pow_mod from UFixedBigInt 2023-03-04 22:10:03 -07:00
Timothy Flynn da0d000909 AK: Ensure short String instances are valid UTF-8
We are currently only validating long strings.
2023-03-03 11:46:42 -05:00
Timothy Flynn c4d78c29a2 AK: Invalidate overlong UTF-8 code point encodings
For example, the code point U+002F could be encoded as UTF-8 with the
bytes 0x80 0xAF. This trick has historically been used to bypass
security checks.
2023-03-03 11:46:42 -05:00
Nico Weber b898a46d7f AK: Make FixedPoint(FloatingPoint) ctor round instead of truncating
This is needed to have code for creating an in-memory sRGB profile using
the (floating-ppoint) numbers from the sRGB spec and having the
fixed-point values in the profile match what they are in other software
(such as GIMP).

It has the side effect of making the FixedPoint ctor no longer constexpr
(which seems fine; nothing was currently relying on that).

Some of FixedPoint's member functions don't round yet, which requires
tweaking a test.
2023-03-03 09:23:02 +00:00
Sam Atkins c06f4ac6f5 AK+Everywhere: Make GenericLexer::ignore_until() stop before the value
`consume_until(foo)` stops before foo, and so does
`ignore_until(Predicate)`, so let's make the other `ignore_until()`
overloads consistent with that so they're less confusing.
2023-02-28 12:55:10 +00:00
Tim Ledbetter 6b2f3ad6c8 AK: Fix DeprecatedString::bijective_base_from for large numbers
The output of the DeprecatedString::bijective_base_from() is now
correct for numbers larger than base^2.

This makes column names display correctly in Spreadsheet.
2023-02-26 21:06:21 +03:30
Linus Groh 09d40bfbb2 Everywhere: Use _{short_,}string to create Strings from literals 2023-02-25 20:51:49 +01:00
Hediadyoin1 fd8c54d720 AK: Add take_first to HashTable and rename pop to take_last
This naming scheme matches Vector.

This also changes `take_last` to move the value it takes, and delete by
known pointer, avoiding a full lookup and potential copies.
2023-02-21 22:13:06 +01:00
Andrew Kaster 0ea697ace5 AK: Add String::from_stream method
The caller is responsible for determining how long the string is that
they want to read.
2023-02-21 10:57:44 +01:00
Nico Weber a30e364f1a AK: Fix printing of negative FixedPoint values
Fixes #17514 by comparing to 0 before truncating the fractional part.
2023-02-18 19:34:10 +01:00
Andreas Kling d0697d350d AK: Fix 64-bit alignment issue in shared-superstring substrings
Thanks to Timothy Flynn for the test!

Fixes #17141
2023-02-18 09:12:46 -05:00
Jelle Raaijmakers 4cd3a84c4b AK: Remove unused rehash_for_collision 2023-02-17 22:29:51 -07:00
Jelle Raaijmakers c08d137fcd AK: Reimplement HashTable with smart linear probing
Instead of rehashing on collisions, we use Robin Hood hashing: a simple
linear probe where we keep track of the distance between the bucket and
its ideal position. On insertion, we allow a new bucket to "steal" the
position of "rich" buckets (those near their ideal position) and move
them further down.

On removal, we shift buckets back up into the freed slot, decrementing
their distance while doing so.

This behavior automatically optimizes the number of required probes for
any value, and removes the need for periodic rehashing (except when
expanding the capacity).
2023-02-17 22:29:51 -07:00
Jelle Raaijmakers f4342c9118 AK: Add AK::SIMD::exp_approximate
This approximation tries to generate values within 0.1% of their actual
expected value. Microbenchmarks indicate that this iterative SIMD
version can be up to 60x faster than `AK::SIMD::exp`.
2023-02-18 01:45:00 +01:00
Sam Atkins abc01cc9fe AK+Tests+LibWeb: Make URL::complete_url() take a StringView
All it does is pass this to `URLParser::parse()` which takes a
StringView, so we might as well take one here too.
2023-02-15 12:48:26 -05:00
Timothy Flynn 5cbf054651 LibUnicode: Fix typos causing text segmentation on mid-word punctuation
For example the words "can't" and "32.3" should not have boundaries
detected on the "'" and "." code points, respectively.

The String test cases fixed here are because "b'ar" is now considered
one word.
2023-02-15 12:36:47 +01:00
Tim Schumacher 43f98ac6e1 Everywhere: Remove the AK:: qualifier from Stream usages 2023-02-13 00:50:07 +00:00
Tim Schumacher 874c7bba28 LibCore: Remove Stream.h 2023-02-13 00:50:07 +00:00
MacDue 654950eaf7 Tests: Add a few tests to verify vectors are using inline storage 2023-02-11 00:03:14 +01:00
Luke Wilde d9d556fbab AK: Allow Vector<ByteBuffer>::contains_slow to accept (Readonly)Bytes
This is done by providing Traits<ByteBuffer>::equals functions for
(Readonly)Bytes, as the base GenericTraits<T>::equals is unable to
convert the ByteBuffer to (Readonly)Bytes to then use Span::operator==

This allows us to check if a Vector<ByteBuffer> contains a
(Readonly)Bytes without having to making a copy of it into a ByteBuffer
first. The initial use of this is in LibWeb with CORS-preflight, where
we check the split contents of the Access-Control headers with
Fetch::Infrastructure::Request::method() and static StringViews
such as "*"sv.bytes().
2023-02-10 22:18:19 +00:00
Tim Schumacher e8d5e938de AK: Remove the deprecated Stream implementation :^) 2023-02-08 19:18:26 +00:00
MacDue 63b11030f0 Everywhere: Use ReadonlySpan<T> instead of Span<T const> 2023-02-08 19:15:45 +00:00
Tim Schumacher 220fbcaa7e AK: Remove the fallible constructor from FixedMemoryStream 2023-02-08 17:44:32 +00:00
Tim Schumacher 8b2f23d016 AK: Remove the fallible constructor from LittleEndianOutputBitStream 2023-02-08 17:44:32 +00:00
Tim Schumacher 0fee97916b AK: Remove the fallbile constructor from BigEndianOutputBitStream 2023-02-08 17:44:32 +00:00
Tim Schumacher 261d62438f AK: Remove the fallible constructor from LittleEndianInputBitStream 2023-02-08 17:44:32 +00:00
Tim Schumacher fa09152e23 AK: Remove the fallible constructor from BigEndianInputBitStream 2023-02-08 17:44:32 +00:00
Tim Schumacher 47531a42a9 AK: Make LEB128 decoding work with read_value 2023-02-04 18:41:27 -07:00
Tim Schumacher 787f4d639a AK: Port LEB128 to the new AK::Stream 2023-02-04 18:41:27 -07:00
Staubfinger da1023fcc5 AK: Add thresholds to quickselect_inline and Statistics::Median
I did a bit of Profiling and made the quickselect and median algorithms
use the best of option for the respective input size.
2023-02-03 19:04:15 +01:00
Staubfinger becd6d106f AK: Testing for AK::quickselect_inline
The testing code found here is mainly derived from the Tests for
`AK::quick_sort`.
2023-02-03 19:04:15 +01:00
Timothy Flynn 2af7447dfd AK: Define HashMap::take to find and remove a value from the map 2023-02-02 19:14:00 +00:00
Timothy Flynn f31bd190b0 AK: Ensure string types are actually considered hash-compatible
The AnyString concept is currently broken because it checks whether a
StringView is constructible from a type T. The StringView constructors,
however, only accept constant rvalue references - i.e. `T const&`.

This also adds a test to ensure this continues to work.
2023-02-02 19:14:00 +00:00
Ben Wiederhake e147d0b572 AK: Fix all quadratic-time append-loops over ByteBuffer 2023-01-31 16:58:25 +01:00
Tim Schumacher 093cf428a3 AK: Move memory streams from LibCore 2023-01-29 19:16:44 -07:00
Tim Schumacher 2470dd3bb5 AK: Move bit streams from LibCore 2023-01-29 19:16:44 -07:00
Tim Schumacher ae64b68717 AK: Deprecate the old AK::Stream
This also removes a few cases where the respective header wasn't
actually required to be included.
2023-01-29 19:16:44 -07:00
Linus Groh 9c08bb9555 AK: Remove try_ prefix from FixedArray creation functions 2023-01-28 22:41:36 +01:00
Timothy Flynn c59268d15b AK: Add String::trim 2023-01-28 00:13:46 +00:00
Timothy Flynn cccaa94767 AK: Add String::join 2023-01-28 00:13:46 +00:00
Linus Groh 6e7459322d AK: Remove StringBuilder::build() in favor of to_deprecated_string()
Having an alias function that only wraps another one is silly, and
keeping the more obvious name should flush out more uses of deprecated
strings.
No behavior change.
2023-01-27 20:38:49 +00:00
Timothy Flynn c35b1371a3 AK: Add an overload of String::find_byte_offset for StringView 2023-01-27 18:00:17 +00:00
Sam Atkins d55f763fcd Tests: Replace uses of JsonObject::get_deprecated()/get_ptr() 2023-01-26 09:57:14 -05:00
Timothy Flynn 427b82065c AK: Add a method to create a String with a repeated code point 2023-01-24 16:23:50 -05:00
Timothy Flynn d50724956e AK: Add a method to find the byte offset of a code point 2023-01-24 16:23:50 -05:00
Nico Weber 8b5b767465 AK: Print leading zeroes after the dot for FixedPoint numbers
As a nearby comment says, "This is a terrible approximation".
This doesn't make things less terrible, but it does make things
more correct in the given framework of terribleness.

Fixes #17156.
2023-01-24 13:24:21 +01:00
Timothy Flynn 12c8bc3e85 AK: Add a String factory to create a string from a single code point 2023-01-22 01:03:13 +00:00
martinfalisse aec2dadfdd AK: Add split() for String 2023-01-21 14:35:00 +01:00
Timothy Flynn 4f5353cbb8 AK: Rename double_hash to rehash_for_collision
The name is currently quite confusing as it indicates it hashes doubles.
2023-01-21 10:36:14 +01:00
Tim Schumacher d7eead4f4c AK: Remove DuplexMemoryStream 2023-01-20 20:48:40 +00:00
Timothy Flynn d48266a420 AK: Support creating known short string literals at compile time
In cases where we know a string literal will fit in the short string
storage, we can do so at compile time without needing to handle error
propagation. If the provided string literal is too long, a compilation
error will be emitted due to the failed VERIFY statement being a non-
constant expression.
2023-01-20 14:24:12 -05:00
Timothy Flynn 537fcaf59e AK+LibUnicode: Provide Unicode-aware caseless String matching
The Unicode spec defines much more complicated caseless matching
algorithms in its Collation spec. This implements the "basic" case
folding comparison.
2023-01-18 14:43:40 +00:00
Sam Atkins 1dd6b7f5b7 AK+Everywhere: Rename JsonObject::get() to ::get_deprecated()
This is a preparatory step to making `get()` return `ErrorOr`.
2023-01-17 19:52:52 -05:00
Sam Atkins efe4329f9f AK: Add JsonValue::{is,as}_integer() methods
The existing `is_i32()` and friends only check if `i32` is their
internal type, but a value such as `0` could be literally any integer
type internally. `is_integer<T>()` instead determines whether the
contained value is an integer and can fit inside T.
2023-01-17 19:52:52 -05:00
Timothy Flynn d6ddca0c0f AK+LibUnicode: Provide Unicode-aware String titlecase transformation 2023-01-16 18:33:44 -05:00
Lucas CHOLLET 3b824ec8c9 AK+Test: Fix a logic error in CircularBuffer::offset_of()
This error was introduced by 9a7accdd and had a significant impact on
`BufferedFile` behavior. Hence, we started seeing crash in test262.

By itself, the issue was a wrong calculation of the internal reading
spans when using the `read` and `until` parameters. Which can lead to
at worse crash in VERIFY and at least weird behaviors as missed needles
or detections out of bounds.

It was also accompanied by an erroneous test.

This patch fixes the bug, the test and also provides more tests.
2023-01-15 23:23:24 +00:00
Timothy Flynn bd9b65e82f AK: Add String::is_one_of for variadic string comparison 2023-01-15 01:00:20 +00:00
Timothy Flynn 9db9b2f9be AK: Add a somewhat naive implementation of String::reverse
This will reverse the String's code points (i.e. not just its bytes),
but is not aware of grapheme clusters.
2023-01-15 01:00:20 +00:00
Lucas CHOLLET 9a7accddb7 AK: Add an optional starting offset to CircularBuffer::offset_of
This parameter allows to start searching after an offset. For example,
to resume a search.

It is unfortunately a breaking change in API so this patch also modifies
one user and one test.
2023-01-14 16:20:30 -07:00
Ben Wiederhake 79b9dd6248 AK+Tests: Make CaseInsensitiveStringViewTraits work with HashMap again 2023-01-14 15:43:27 -07:00
Tim Schumacher 7526f9a8b7 AK: Remove CircularDuplexStream 2023-01-14 12:05:52 -05:00
Timothy Flynn 1d4f287582 AK: Implement FlyString for the new String class
This implements a FlyString that will de-duplicate String instances. The
FlyString will store the raw encoded data of the String instance: If the
String is a short string, FlyString holds the String::ShortString bytes;
otherwise FlyString holds a pointer to the Detail::StringData.

FlyString itself does not know about String's storage or how to refcount
its Detail::StringData. It defers to String to implement these details.
2023-01-12 11:23:58 +01:00
Timothy Flynn 6fcc1c7426 AK+LibUnicode: Provide Unicode-aware String case transformations
Since AK can't refer to LibUnicode directly, the strategy here is that
if you need case transformations, you can link LibUnicode and receive
them. If you try to use either of these methods without linking it, then
you'll of course get a linker error (note we don't do any fallbacks to
e.g. ASCII case transformations). If you don't need these methods, you
don't have to link LibUnicode.
2023-01-09 19:23:46 -07:00
Timothy Flynn f3db548a3d AK+Everywhere: Rename FlyString to DeprecatedFlyString
DeprecatedFlyString relies heavily on DeprecatedString's StringImpl, so
let's rename it to A) match the name of DeprecatedString, B) write a new
FlyString class that is tied to String.
2023-01-09 23:00:24 +00:00
Timothy Flynn 2eacc7aec1 AK: Add Utf16View::to_utf8 to convert the view to a UTF-8 AK::String 2023-01-09 23:00:24 +00:00
Timothy Flynn d0403ec14f AK+Everywhere: Rename Utf16View::to_utf8 to to_deprecated_string
A subsequent commit will add to_utf8 back to create an AK::String.
2023-01-09 23:00:24 +00:00
Timothy Flynn d793262beb AK+Everywhere: Make UTF-16 to UTF-8 converter fallible
This could fail to allocate the underlying storage needed to store the
UTF-8 data. Propagate this error.
2023-01-08 12:13:15 +01:00
Timothy Flynn 1edb96376b AK+Everywhere: Make UTF-8 and UTF-32 to UTF-16 converters fallible
These could fail to allocate the underlying storage needed to store the
UTF-16 data. Propagate these errors.
2023-01-08 12:13:15 +01:00
Andrew Kaster a8fcd39b88 AK: Reimplement comparisons on AK::Time using operator<=>
This allows us to make all comparision operators on the class constexpr
without pulling in a bunch of boilerplate. We don't use the `<compare>`
header because it doesn't compile in the main serenity cross-build due
to the include paths to LibC being incompatible with how libc++ expects
them to be for clang builds.
2023-01-07 14:51:04 +01:00
Thiago Henrique Hupner 401bc13776 AK: Use base URL when the specified URL is empty 2023-01-06 13:59:17 -07:00
Aliaksandr Kalenik c6d494513e AK: Fix typo in -= operator of DistinctNumeric 2023-01-06 12:01:46 +01:00
Ben Wiederhake c2a900b853 Everywhere: Remove unused includes of AK/StdLibExtras.h
These instances were detected by searching for files that include
AK/StdLibExtras.h, but don't match the regex:

\\b(abs|AK_REPLACED_STD_NAMESPACE|array_size|ceil_div|clamp|exchange|for
ward|is_constant_evaluated|is_power_of_two|max|min|mix|move|_RawPtr|RawP
tr|round_up_to_power_of_two|swap|to_underlying)\\b

(Without the linebreaks.)

This regex is pessimistic, so there might be more files that don't
actually use any "extra stdlib" functions.

In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
2023-01-02 20:27:20 -05:00
Ben Wiederhake 6fd478b6ce Everywhere: Remove unused includes of AK/Format.h
These instances were detected by searching for files that include
AK/Format.h, but don't match the regex:

\\b(CheckedFormatString|critical_dmesgln|dbgln|dbgln_if|dmesgln|FormatBu
ilder|__FormatIfSupported|FormatIfSupported|FormatParser|FormatString|Fo
rmattable|Formatter|__format_value|HasFormatter|max_format_arguments|out
|outln|set_debug_enabled|StandardFormatter|TypeErasedFormatParams|TypeEr
asedParameter|VariadicFormatParams|v_critical_dmesgln|vdbgln|vdmesgln|vf
ormat|vout|warn|warnln|warnln_if)\\b

(Without the linebreaks.)

This regex is pessimistic, so there might be more files that don't
actually use any formatting functions.

Observe that this revealed that Userland/Libraries/LibC/signal.cpp is
missing an include.

In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
2023-01-02 20:27:20 -05:00
Ben Wiederhake 3334cf675a AK+Kernel: Eliminate UB (signed overflow) from days_since_epoch 2023-01-02 16:19:35 -05:00
Ben Wiederhake 7a69219a35 AK+Tests: Replace years_to_days_since_epoch by near-instant function
This solves half the problem of #12729. Note that the inverse function
time_to_tm() in LibC/time.cpp still uses a slow for-loop.

See also #13138
2023-01-02 16:19:35 -05:00
Ben Wiederhake 60ee695287 AK+Tests: Demonstrate slowness of years_to_days_since_epoch
In practice, this function does not take any perceptible amount of time.
However, this benchmark demonstrates that for extreme values, the
internal for-loop does matter.
2023-01-02 16:19:35 -05:00
Lenny Maiorani e0ab7763da AK: Combine SinglyLinkedList and SinglyLinkedListWithCount
Using policy based design `SinglyLinkedList` and
`SinglyLinkedListWithCount` can be combined into one class which takes
a policy to determine how to keep track of the size of the list. The
default policy is to use list iteration to count the items in the list
each time. The `WithCount` form is a different policy which tracks the
size, but comes with the overhead of storing the count and
incrementing/decrementing on each modification.

This model is extensible to have other forms of counting by
implementing only a new policy instead of implementing a totally new
type.
2023-01-02 20:13:24 +00:00
Arda Cinar 0105600120 AK+Tests: Add a test for formatting numbers in base 10 units
Added a test case to TestNumberFormat to test the new base 10 formatting
capability
2023-01-02 20:11:18 +00:00
Ben Wiederhake b83cb09db1 Everywhere: Fix badly-formatted includes
In 7c5e30daaa, the focus was "only" on
Userland/Libraries/, whereas this commit cleans up the remaining
headers in the repo, and any new badly-formatted include.
2023-01-02 11:06:15 -05:00
Lucas CHOLLET f12e81b74a AK: Add CircularBuffer
The class is very similar to `CircularDuplexStream` in its behavior.
Main differences are that `CircularBuffer`:
 - does not inherit from `AK::Stream`
 - uses `ErrorOr` for its API
 - is heap allocated (and OOM-Safe)

 This patch also add some tests.
2022-12-31 04:44:17 -07:00
Nico Weber c051532c82 AK: Add tests for LittleEndian<enum class>
This should've been in 57126a0d3c.
2022-12-28 22:27:19 -05:00
Nico Weber 57126a0d3c AK: Make BigEndian<> and LittleEndian<> work with enum classes 2022-12-28 20:13:12 +00:00
Liav A feeb25bcee AK: Remove i686 support 2022-12-28 11:53:41 +01:00
Florian Cramer af2ffcaba8 AK: Make StringUtils::matches() handle escaping correctly
Previously any backslash and the character following it were ignored.
This commit adds a fall through to match the character following the
backslash without checking whether it is "special".
2022-12-27 07:28:25 +03:30
Timothy Flynn d2a304ae87 AK: Specialize TypeList for Variant types
This allows callers to use the following semantics:

    using MyVariant = Variant<Empty, int>;

    template<typename T>
    size_t size() { return TypeList<T>::size; }

    auto s = size<MyVariant>();

This will be needed for an upcoming IPC change, which will result in us
knowing the Variant type, but not the underlying variadic types that the
Variant holds.
2022-12-26 09:36:16 +01:00
Sam Atkins 29733e65f8 AK+Everywhere: Replace all Bitmap::must_create() uses with ::create()
Well, *someone* has to add some more FIXMEs to keep FIXME Roulette
going. :^)
2022-12-22 15:48:53 +01:00
Jelle Raaijmakers 25f2e4981c AK: Stop using DeprecatedString in Base64 encoding 2022-12-20 10:34:19 +01:00
Lenny Maiorani f2336d0144 AK+Everywhere: Move custom deleter capability to OwnPtr
`OwnPtrWithCustomDeleter` was a decorator which provided the ability
to add a custom deleter to `OwnPtr` by wrapping and taking the deleter
as a run-time argument to the constructor. This solution means that no
additional space is needed for the `OwnPtr` because it doesn't need to
store a pointer to the deleter, but comes at the cost of having an
extra type that stores a pointer for every instance.

This logic is moved directly into `OwnPtr` by adding a template
argument that is defaulted to the default deleter for the type. This
means that the type itself stores the pointer to the deleter instead
of every instance and adds some type safety by encoding the deleter in
the type itself instead of taking a run-time argument.
2022-12-17 16:00:08 -05:00
Lenny Maiorani 5875e66531 Tests: ASCII-betical-ize CMakeLists AK tests 2022-12-17 18:32:26 +01:00
Poseydon42 bdd7531bf5 AK: Create relative path even if prefix is not an ancestor of the path 2022-12-14 15:11:03 +00:00
Ali Mohammad Pur f96a3c002a Everywhere: Stop shoving things into ::std and mentioning them as such
Note that this still keeps the old behaviour of putting things in std by
default on serenity so the tools can be happy, but if USING_AK_GLOBALLY
is unset, AK behaves like a good citizen and doesn't try to put things
in the ::std namespace.

std::nothrow_t and its friends get to stay because I'm being told that
compilers assume things about them and I can't yeet them into a
different namespace...for now.
2022-12-14 11:44:32 +01:00
Marc Luqué 22f472249d AK: Introduce cutoff to insertion sort for Quicksort
Implement insertion sort in AK. The cutoff value 7 is a magic number
here, values [5, 15] should work well. Main idea of the cutoff is to
reduce recursion performed by quicksort to speed up sorting
of small partitions.
2022-12-12 15:03:57 +00:00
kleines Filmröllchen 16ca41ec10 AK: Add LexicalPath::is_child_of
This API checks whether this path is a child of (or the same as) another
path.
2022-12-11 16:05:23 +00:00
Maciej 58f5deba70 AK: Unref old m_data in String's move assignment
We were overridding the data pointer without unreffing it,
causing a memory leak when assigning a String.
2022-12-09 00:02:53 +01:00
Timothy Flynn 949f5460fb AK: Add formatters for Span<T> and Span<T const>
This generalizes the formatter currently used for Vector to be usable
for any Span.
2022-12-08 17:14:48 +01:00
Poseydon42 d2334957ba Tests: Add tests for Checked<> decrement operator 2022-12-08 07:20:14 -05:00
Andreas Kling a3e82eaad3 AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.

Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.

Problems in DeprecatedString:

- It assumes string allocation never fails. This makes it impossible
  to use in allocation-sensitive contexts, and is the reason we had to
  ban DeprecatedString from the kernel entirely.

- The awkward null state. DeprecatedString can be null. It's different
  from the empty state, although null strings are considered empty.
  All code is immediately nicer when using Optional<DeprecatedString>
  but DeprecatedString came before Optional, which is how we ended up
  like this.

- The encoding of the underlying data is ambiguous. For the most part,
  we use it as if it's always UTF-8, but there have been cases where
  we pass around strings in other encodings (e.g ISO8859-1)

- operator[] and length() are used to iterate over DeprecatedString one
  byte at a time. This is done all over the codebase, and will *not*
  give the right results unless the string is all ASCII.

How we solve these issues in the new String:

- Functions that may allocate now return ErrorOr<String> so that ENOMEM
  errors can be passed to the caller.

- String has no null state. Use Optional<String> when needed.

- String is always UTF-8. This is validated when constructing a String.
  We may need to add a bypass for this in the future, for cases where
  you have a known-good string, but for now: validate all the things!

- There is no operator[] or length(). You can get the underlying data
  with bytes(), but for iterating over code points, you should be using
  an UTF-8 iterator.

Furthermore, it has two nifty new features:

- String implements a small string optimization (SSO) for strings that
  can fit entirely within a pointer. This means up to 3 bytes on 32-bit
  platforms, and 7 bytes on 64-bit platforms. Such small strings will
  not be heap-allocated.

- String can create substrings without making a deep copy of the
  substring. Instead, the superstring gets +1 refcount from the
  substring, and it acts like a view into the superstring. To make
  substrings like this, use the substring_with_shared_superstring() API.

One caveat:

- String does not guarantee that the underlying data is null-terminated
  like DeprecatedString does today. While this was nifty in a handful of
  places where we were calling C functions, it did stand in the way of
  shared-superstring substrings.
2022-12-06 15:21:26 +01:00
Linus Groh 57dc179b1f Everywhere: Rename to_{string => deprecated_string}() where applicable
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.

One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
2022-12-06 08:54:33 +01:00
Linus Groh 6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00