Commit graph

3400 commits

Author SHA1 Message Date
Tim Ledbetter 65827826fe AK: Add CharacterTypes::is_ascii_base36_digit()
This can be used to validate the string passed to
`parse_ascii_base36_digit()`.
2024-01-13 19:01:35 -07:00
Dan Klishch ccd701809f Everywhere: Add deprecated_ prefix to JsonValue::to_byte_string
`JsonValue::to_byte_string` has peculiar type-erasure semantics which is
not usually intended. Unfortunately, it also has a very stereotypical
name which does not warn about unexpected behavior. So let's prefix it
with `deprecated_` to make new code use `as_string` if it just wants to
get string value or `serialized<StringBuilder>` if it needs to do proper
serialization.
2024-01-12 17:41:34 -07:00
kleines Filmröllchen eada4f2ee8 AK: Remove ByteString from GenericLexer
A bunch of users used consume_specific with a constant ByteString
literal, which can be replaced by an allocation-free StringView literal.

The generic consume_while overload gains a requires clause so that
consume_specific("abc") causes a more understandable and actionable
error.
2024-01-12 17:03:53 -07:00
Martin Janiczek 5a8781393a AK: Cover TestComplex with more tests
Related:
- video detailing the process of writing these tests: https://www.youtube.com/watch?v=enxglLlALvI
- PR fixing bugs the above effort found: https://github.com/SerenityOS/serenity/pull/22025
2024-01-12 16:42:51 -07:00
Martin Janiczek d52ffcd830 LibTest: Add more numeric generators
Rename unsigned_int generator to number_u32.
Add generators:
- number_u64
- number_f64
- percentage
2024-01-12 16:42:51 -07:00
Andrew Kaster 09ce32039f AK: Use cast to const void pointer in to_readonly_span helper
This lets developers actually hex-dump print `Span<T const>` using the
helper as intended.
2024-01-06 10:13:14 +01:00
Timothy Flynn cae184d7cf AK: Improve performance of StringUtils::find_last
The current algorithm is currently O(N^2) because we forward-search an
ever-increasing substring of the haystack. This implementation reduces
the search time of a 500,000-length string (where the desired needle is
at index 0) from 72 seconds to 2-3 milliseconds.
2024-01-04 11:28:03 -05:00
Timothy Flynn 9cab4958e6 AK: Convert a couple String-related declarations to east-const
Caught by clang-format-17. Note that clang-format-16 is fine with this
as well (it leaves the const placement alone), it just doesn't perform
the formatting to east-const itself.
2024-01-04 11:28:03 -05:00
Timothy Flynn 1b4a23095c AK: Add a Utf16View::starts_with method
Based heavily on Utf8View::starts_with.
2024-01-04 12:43:10 +01:00
Timothy Flynn c46ba7e68d AK: Allow constructing a UTF-16 view from a UTF-16 string literal
UTF-16 string literals are a language-level feature. It is convenient to
be able to construct a Utf16View from these strings.
2024-01-04 12:43:10 +01:00
Aliaksandr Kalenik e394971209 AK+LibWeb: Use segmented vector to store commands in RecordingPainter
Using a vector to represent a list of painting commands results in many
reallocations, especially on pages with a lot of content.

This change addresses it by introducing a SegmentedVector, which allows
fast appending by representing a list as a sequence of fixed-size
vectors. Currently, this new data structure supports only the
operations used in RecordingPainter, which are appending and iterating.
2023-12-30 23:02:46 +01:00
Andreas Kling 7ad7ae7000 AK: Check URL parser input for invalid (tabs or spaces) in 1 pass
Combine 2 passes into 1 by iterating over the input once and checking
for both '\t' and '\n'.
2023-12-30 13:49:50 +01:00
Andreas Kling a19d8a4a37 AK: Add ASCII fast path to Utf8CodePointIterator
Much of the UTF-8 data that we'll iterate over will be ASCII only,
and we can get a significant speed-up by simply having a fast path
when the iterator points at a byte that is obviously an ASCII character
(<= 0x7F).
2023-12-30 13:49:50 +01:00
Andreas Kling 75cecd19a5 AK: Skip UTF-8 validation inside URL parser
Since we're already building up a percent-encoded ASCII-only string
in the internal parser buffer, there's no need to do a second UTF-8
validation pass before assigning each part of the parsed URL.

This makes URL parsing signficantly faster.
2023-12-30 13:49:50 +01:00
Andreas Kling f045a877b4 AK: Implement StringBuilder::append_code_point() more efficiently
Instead of do a wrappy MUST(try_append_code_point()), we now inline
the UTF-8 encoding logic. This allows us to grow the buffer by the
right increment up front, and also removes a bunch of ErrorOr ceremony
that we don't care about.
2023-12-30 13:49:50 +01:00
Andreas Kling bacbc376a0 AK: Make StringView::contains(StringView) faster for 1-byte needles
If we're looking for a 1-byte string, we can do the much simpler byte
scan by simply forwarding the call to StringView::contains(char).
2023-12-30 13:49:50 +01:00
Andreas Kling 6c51ba27a2 AK: Make URL percent encoding faster by exploiting ASCII knowledge
Once we know that a code point must be a valid ASCII character,
we now cast it to `char` and avoid the expensive generic
StringView::contains(u32 code_point) checks.

This dramatically speeds up URL parsing.
2023-12-30 13:49:50 +01:00
Andreas Kling 3c039903fb LibTextCodec+AK: Don't validate UTF-8 strings twice
UTF8Decoder was already converting invalid data into replacement
characters while converting, so we know for sure we have valid UTF-8
by the time conversion is finished.

This patch adds a new StringBuilder::to_string_without_validation()
and uses it to make UTF8Decoder avoid half the work it was doing.
2023-12-30 13:49:50 +01:00
Andreas Kling a285e36041 LibJS+AK: Make String.prototype.repeat() way faster
Instead of using a StringBuilder, add a String::repeated(String, N)
overload that takes advantage of knowing it's already all UTF-8.

This makes the following microbenchmark go 4x faster:

    "foo".repeat(100_000_000)

And for single character strings, we can even go 10x faster:

    "x".repeat(100_000_000)
2023-12-30 13:49:50 +01:00
Andrew Kaster 053e4d5e64 AK: Only try to print gettid() in dbgln on Linux and Serenity
On every other Unix, the relationship between thread id and process id
is not nearly as direct.
2023-12-29 09:46:50 +01:00
Timothy Flynn 507a5d8a07 AK: Add an option to zero-fill ByteBuffer data upon growth
This is to avoid UB in cases where we need to be able to read from the
buffer immediately after resizing it.
2023-12-27 19:30:39 +01:00
Shannon Booth d51f84501a AK: Remove now unused to_{int,uint,float,double} String functions 2023-12-23 20:41:07 +01:00
Shannon Booth e2e7c4d574 Everywhere: Use to_number<T> instead of to_{int,uint,float,double}
In a bunch of cases, this actually ends up simplifying the code as
to_number will handle something such as:

```
Optional<I> opt;
if constexpr (IsSigned<I>)
    opt = view.to_int<I>();
else
    opt = view.to_uint<I>();
```

For us.

The main goal here however is to have a single generic number conversion
API between all of the String classes.
2023-12-23 20:41:07 +01:00
Shannon Booth a4ecc65398 AK: Add DeprecatedFlyString::to_number<T> 2023-12-23 20:41:07 +01:00
Shannon Booth 159eda5c6d AK: Add ByteString::to_number<T>
To mirror the API with StringView and String.
2023-12-23 20:41:07 +01:00
Shannon Booth cdf84a3e36 AK: Implement StringView::to_number<T> from String::to_number<T>
Do exactly what String does, then use StringView's implementation as
String's new one. This should allow us to call to_number on a
StringView.
2023-12-23 20:41:07 +01:00
Ali Mohammad Pur 64616d3997 AK: Completely disable rich debug formats on Windows
Half the functions used are not readily available on windows, instead of
creating more ifdef soup, this commit simply disables the rich debug
stuff on windows.
2023-12-22 10:59:21 +01:00
Andreas Kling a264cf79c4 AK: Use fallback builtins for overflow checks in AK::Checked
If we don't have __builtin_add_overflow_p(), we can also try using
__builtin_add_overflow(). This makes debug builds with Clang
significantly faster since they no longer need to use the generic
implementation. Same for multiplication.
2023-12-21 15:31:32 +01:00
Andreas Kling 9f0aa08468 AK: Add ByteString::from_utf8_without_validation()
This will be used by Jakt to create ByteString from string literals
which we can validate at compile time instead of runtime. :^)
2023-12-21 13:49:41 +01:00
Andreas Kling b27a62488c AK: Add ByteString::must_from_utf8(StringView) for Jakt 2023-12-18 12:41:25 +01:00
Jesús "gsus" Lapastora 7578620f25 AK/StringUtils: Ensure needle positions don't overlap in replace
Previously, `replace` used `find_all` to find all of the positions to
replace. But `find_all` finds all the *overlapping* instances of the
needle, while `replace` assumed that the next position was always at
least `needle.length()` away from the last one. This led to crashes like
https://github.com/SerenityOS/jakt/issues/1159.
2023-12-17 12:00:48 -07:00
Ali Mohammad Pur 5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
Tim Schumacher e2d4952f0f AK: Add Array::from_repeated_value() 2023-12-14 08:59:23 -07:00
Andrew Kaster 4db5e2ba22 AK: Print timestamp, process name, and pid on all platforms
This requires duplicating some logic from Core::Process::get_name()
into AK, which seems unfortunate. But for now, this greatly improves the
log messages for testing Ladybird on Linux.

The feature is hidden behind a runtime flag with a global setter in the
same way that totally enabling/disabling dbgln is.
2023-12-12 10:11:24 -07:00
Simon Wanner 58f08107b0 AK+LibUnicode: Add Unicode::create_unicode_url
This is a workaround for the fact that AK::URLParser can't call into
LibUnicode directly.
2023-12-10 08:04:58 -05:00
Shannon Booth 73f7f33205 AK: Disallow calling FlyString::from_utf8 on FlyString and String 2023-12-10 09:45:03 +01:00
Shannon Booth 5f2f26451d AK: Disallow String::from_utf8 on FlyString and String 2023-12-10 09:45:03 +01:00
implicitfield 2de582afc4 AK: Make ByteBuffer's trim helper public 2023-12-08 22:05:43 +03:30
Bastiaan van der Plaat 4a7d3115c9 AK: Add String to number floating point support 2023-12-04 19:54:43 +00:00
Tim Schumacher e9e89d7e4e AK: Optimize BitStream refilling a bit further
This tries to optimize the refill code by making it easier to digest for
the branch predictor. This includes not looping as much across function
calls and marking our EOF case to be unlikely.

Co-Authored-By: Lucas Chollet <lucas.chollet@free.fr>
2023-12-01 12:48:18 +01:00
Tim Schumacher 197331c922 AK: Reject BitStream reads beyond EOF by default
The only exception to this is the lossless WebP decoder, which
legitimately relies on this behavior, even upstream.
2023-12-01 12:48:18 +01:00
Tim Schumacher cb03d3d78f AK: Allow rejecting BitStream reads beyond EOF 2023-12-01 12:48:18 +01:00
Tim Schumacher de49413bdf AK: Don't slice off the first byte of a BitStream read 2023-12-01 12:48:18 +01:00
Lucas CHOLLET aaf54f8cf8 AK: Allow Optional<T&> to be constructed by OptionalNone()
This is an extension of cc0b970d but for the reference-handling
specialization of Optional.

This basically allow us to write code like:
```cpp
Optional<u8&> opt;
opt = OptionalNone{};
```
2023-11-29 02:19:41 +03:30
Shannon Booth 6b32a1f18f AK+LibUnicode: Expose TrailingCodePointTransformation in to_titlecase
Relocating the definition of this enum from LibUnicode to AK.
2023-11-28 17:15:27 -05:00
Timothy Flynn 6aa334767f AK: Ensure assigned-to Strings are dereferenced if needed
If we assign to an existing non-short string, we must dereference its
StringData object to prevent leaking that data.
2023-11-28 16:38:18 +01:00
Michiel Visser 51fe8f820f AK: Fix compile error when using div_mod_internal<513, 256, true> 2023-11-27 09:43:07 +03:30
Dan Klishch 80d1c93edf AK+Applications: Return value from JsonObject::get_double more often
Previously, we were returning an empty optional if key contained a
numerical value which was not stored as double. Stop doing that and
rename the method to signify the change in the behavior.

Apparently, this fixes bug in an InspectorWidget in Ladybird on
Serenity: it showed 0 for element's boxes with integer sizes.
2023-11-25 11:02:17 +01:00
Andreas Kling a6106ca221 AK: Use __builtin_offsetof() + -Wno-invalid-offsetof to silence ASAN
ASAN was crying way too much when running the LibJS JIT since the old
OFFSET_OF implementation was too wild for its liking.

By turning off the invalid-offsetof warnings, we can use the offsetof
builtin instead. However, I'm leaving this as a wrapper macro, since
we may still want to do something different for other compilers.
2023-11-24 12:49:15 +01:00
timmot da3cfd5bbc AK+LibWeb: Make clamp_to_int generic over all integrals 2023-11-24 08:42:18 +01:00