Commit graph

170 commits

Author SHA1 Message Date
Andreas Kling 1e820385d9 AK: Add case-insensitive hashing for the new String classes
Bringing over this functionality from DeprecatedString.
2023-09-06 11:29:03 -04:00
Lucas CHOLLET fde26c53f0 AK: Remove the API to explicitly construct short strings
Now that ""_string is infallible, the only benefit of explicitly
constructing a short string is the ability to do it at compile-time. But
we never do that, so let's simplify the API and remove this
implementation detail from it.
2023-08-08 07:37:21 +02:00
Andreas Kling 34344120f2 AK: Make "foo"_string infallible
Stop worrying about tiny OOMs.

Work towards #20405.
2023-08-07 16:03:27 +02:00
aryanbaburajan a94c0eea94 AK: Add trim_ascii_whitespace method to String 2023-08-06 22:21:10 +02:00
Andrew Kaster 3533d3e452 AK: Enable consteval workaround for Android NDK
Android isn't shipping clang-15 yet in any NDK, so use the existing
workaround on that platform.
2023-07-19 04:22:28 -06:00
Andrew Kaster bfd6deed1e AK+Meta: Disable consteval completely when building for oss-fuzz
This was missed in 02b74e5a70

We need to disable consteval in AK::String as well as AK::StringView,
and we need to disable it when building both the tools build and the
fuzzer build.
2023-06-29 15:55:54 -06:00
Hendiadyoin1 ca0106ba1d AK: Forbid from_utf8 and from_deprecated_{...} with unintended types
Calling `from_utf8` with a DeprecatedString will hide the fact that we
have a DeprecatedString, while using `from_deprecated_string` with a
StringView will silently and needlessly allocate a DeprecatedString,
so let's forbid that.
2023-06-13 01:49:02 +02:00
Timothy Flynn d6b786b3fe AK: Use consteval String factories on macOS
Xcode 14.3 ships with clang 15, which supports our usage of consteval to
validate short strings at compile time.
2023-05-08 20:54:31 -06:00
thankyouverycool 9a03e4dd73 AK: Add count() helper to String 2023-04-30 05:48:14 +02:00
Andreas Kling d517e7fb3a AK: Make FlyString::hash() use the cached hash in StringData if possible
This avoids rehashing the string every time.
2023-03-09 21:54:59 +01:00
Timothy Flynn 1393ed2000 AK+LibUnicode: Implement String::equals_ignoring_case without allocating
We currently fully casefold the left- and right-hand sides to compare
two strings with case-insensitivity. Now, we casefold one code point at
a time, storing the result in a view for comparison, until we exhaust
both strings.
2023-03-08 18:57:53 +00:00
Timothy Flynn 515fca4f7a AK: Make String::contains(code_point) handle non-ASCII
We currently only accept a char, instead of a full code point.
2023-03-08 14:16:47 +00:00
Timothy Flynn f882581e91 AK: Make String::{starts,ends}_with(code_point) handle non-ASCII
We currently pass the code point to StringView::{starts,ends}_with,
which actually accepts a single char, thus cannot handle non-ASCII
code points.
2023-03-08 14:16:47 +00:00
Timothy Flynn da0d000909 AK: Ensure short String instances are valid UTF-8
We are currently only validating long strings.
2023-03-03 11:46:42 -05:00
Linus Groh 45dc3d8a3e AK: Add String::ends_with{,_bytes}() 2023-03-03 11:02:21 +00:00
Ali Mohammad Pur 79e4027480 AK: Add two starts_with{bytes,}() APIs to String 2023-02-28 15:52:24 +03:30
Timothy Flynn 5eec76b441 AK: Use the same consteval condition on _short_string as its factory
This fixes the build with Apple Clang.
2023-02-25 22:25:05 +01:00
Linus Groh 85414d9338 AK: Add operator""_{short_,}string to create a String from a literal
We briefly discussed this when adding the new String type but couldn't
settle on a name. However, having to use String::from_utf8() on every
literal string is a bit unwieldy, so let's have these options available!

Naming-wise '_string' is not as short as 'sv' but should be relatively
clear; it also matches '_bigint' and '_ubigint' in length.
'_short_string' may be longer than the actual string itself, but it's
still an improvement over the static function :^)

Since our C++ source files are UTF-8 encoded anyway, it should be
impossible to create a string literal with invalid UTF-8, so including
that in the name is not as important as in the function that can receive
arbitrary data.
2023-02-25 20:51:49 +01:00
Andrew Kaster 0ea697ace5 AK: Add String::from_stream method
The caller is responsible for determining how long the string is that
they want to read.
2023-02-21 10:57:44 +01:00
Andreas Kling e08c55dd8d AK: Make String const-correct internally 2023-02-21 00:54:04 +01:00
nipos c31b547fae AK: Use constexpr instead of consteval on OpenBSD 2023-02-04 16:11:54 -07:00
Timothy Flynn c59268d15b AK: Add String::trim 2023-01-28 00:13:46 +00:00
Timothy Flynn cccaa94767 AK: Add String::join 2023-01-28 00:13:46 +00:00
Timothy Flynn c35b1371a3 AK: Add an overload of String::find_byte_offset for StringView 2023-01-27 18:00:17 +00:00
Timothy Flynn 76fd5f2756 AK: Add convenience substring wrappers to String to exclude a length
These overloads exist on other string classes and are used throughout
the code base.
2023-01-24 16:23:50 -05:00
Timothy Flynn 427b82065c AK: Add a method to create a String with a repeated code point 2023-01-24 16:23:50 -05:00
Timothy Flynn d50724956e AK: Add a method to find the byte offset of a code point 2023-01-24 16:23:50 -05:00
Timothy Flynn 5e44b93af2 AK: Remove [[nodiscard]] attribute from String methods returning ErrorOr 2023-01-24 16:23:50 -05:00
Timothy Flynn 12c8bc3e85 AK: Add a String factory to create a string from a single code point 2023-01-22 01:03:13 +00:00
Timothy Flynn 8aca8e82cb AK: Change String's default constructor to be constant
This allows creating expressions such as:

    constexpr Array<String, 10> {};
2023-01-22 01:03:13 +00:00
martinfalisse aec2dadfdd AK: Add split() for String 2023-01-21 14:35:00 +01:00
Timothy Flynn c8e25a71e0 AK: Disable use of consteval in String::from_utf8_short_string for Apple
This causes an ICE on older versions of clang, and Apple's clang is
currently based on such a version.
2023-01-20 20:33:04 +00:00
Timothy Flynn d48266a420 AK: Support creating known short string literals at compile time
In cases where we know a string literal will fit in the short string
storage, we can do so at compile time without needing to handle error
propagation. If the provided string literal is too long, a compilation
error will be emitted due to the failed VERIFY statement being a non-
constant expression.
2023-01-20 14:24:12 -05:00
Timothy Flynn 537fcaf59e AK+LibUnicode: Provide Unicode-aware caseless String matching
The Unicode spec defines much more complicated caseless matching
algorithms in its Collation spec. This implements the "basic" case
folding comparison.
2023-01-18 14:43:40 +00:00
Timothy Flynn d6ddca0c0f AK+LibUnicode: Provide Unicode-aware String titlecase transformation 2023-01-16 18:33:44 -05:00
Timothy Flynn 63c814fa2f AK: Add String::to_number 2023-01-15 01:00:20 +00:00
Timothy Flynn cf0899f440 AK: Add String::contains 2023-01-15 01:00:20 +00:00
Timothy Flynn bd9b65e82f AK: Add String::is_one_of for variadic string comparison 2023-01-15 01:00:20 +00:00
Timothy Flynn 9db9b2f9be AK: Add a somewhat naive implementation of String::reverse
This will reverse the String's code points (i.e. not just its bytes),
but is not aware of grapheme clusters.
2023-01-15 01:00:20 +00:00
MacDue 9a120d7243 AK: Add support for "debug only" formatters
These are formatters that can only be used with debug print
functions, such as dbgln(). Currently this is limited to
Formatter<ErrorOr<T>>. With this you can still debug log ErrorOr
values (good for debugging), but trying to use them in any
String::formatted() call will fail (which prevents .to_string()
errors with the new failable strings being ignored).

You make a formatter debug only by adding a constexpr method like:
static constexpr bool is_debug_only() { return true; }
2023-01-13 21:09:26 +00:00
Timothy Flynn 1d4f287582 AK: Implement FlyString for the new String class
This implements a FlyString that will de-duplicate String instances. The
FlyString will store the raw encoded data of the String instance: If the
String is a short string, FlyString holds the String::ShortString bytes;
otherwise FlyString holds a pointer to the Detail::StringData.

FlyString itself does not know about String's storage or how to refcount
its Detail::StringData. It defers to String to implement these details.
2023-01-12 11:23:58 +01:00
Timothy Flynn 6fcc1c7426 AK+LibUnicode: Provide Unicode-aware String case transformations
Since AK can't refer to LibUnicode directly, the strategy here is that
if you need case transformations, you can link LibUnicode and receive
them. If you try to use either of these methods without linking it, then
you'll of course get a linker error (note we don't do any fallbacks to
e.g. ASCII case transformations). If you don't need these methods, you
don't have to link LibUnicode.
2023-01-09 19:23:46 -07:00
kleines Filmröllchen ca80353efe AK: Add comparison operator
s p a c e s h i p  o p e r a t o r

Comparing UTF-8 can be done by simple byte lexicographic comparison per
definition, so we just piggy-back on StringView's high-performance
comparator.
2022-12-11 16:05:23 +00:00
Moustafa Raafat ae2abcebbb Everywhere: Use C++ concepts instead of requires clauses 2022-12-09 11:25:30 +00:00
Andreas Kling a3e82eaad3 AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.

Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.

Problems in DeprecatedString:

- It assumes string allocation never fails. This makes it impossible
  to use in allocation-sensitive contexts, and is the reason we had to
  ban DeprecatedString from the kernel entirely.

- The awkward null state. DeprecatedString can be null. It's different
  from the empty state, although null strings are considered empty.
  All code is immediately nicer when using Optional<DeprecatedString>
  but DeprecatedString came before Optional, which is how we ended up
  like this.

- The encoding of the underlying data is ambiguous. For the most part,
  we use it as if it's always UTF-8, but there have been cases where
  we pass around strings in other encodings (e.g ISO8859-1)

- operator[] and length() are used to iterate over DeprecatedString one
  byte at a time. This is done all over the codebase, and will *not*
  give the right results unless the string is all ASCII.

How we solve these issues in the new String:

- Functions that may allocate now return ErrorOr<String> so that ENOMEM
  errors can be passed to the caller.

- String has no null state. Use Optional<String> when needed.

- String is always UTF-8. This is validated when constructing a String.
  We may need to add a bypass for this in the future, for cases where
  you have a known-good string, but for now: validate all the things!

- There is no operator[] or length(). You can get the underlying data
  with bytes(), but for iterating over code points, you should be using
  an UTF-8 iterator.

Furthermore, it has two nifty new features:

- String implements a small string optimization (SSO) for strings that
  can fit entirely within a pointer. This means up to 3 bytes on 32-bit
  platforms, and 7 bytes on 64-bit platforms. Such small strings will
  not be heap-allocated.

- String can create substrings without making a deep copy of the
  substring. Instead, the superstring gets +1 refcount from the
  substring, and it acts like a view into the superstring. To make
  substrings like this, use the substring_with_shared_superstring() API.

One caveat:

- String does not guarantee that the underlying data is null-terminated
  like DeprecatedString does today. While this was nifty in a handful of
  places where we were calling C functions, it did stand in the way of
  shared-superstring substrings.
2022-12-06 15:21:26 +01:00
Linus Groh 6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
Linus Groh d26aabff04 Everywhere: Run clang-format 2022-12-03 23:52:23 +00:00
Andreas Kling ae3ffdd521 AK: Make it possible to not using AK classes into the global namespace
This patch adds the `USING_AK_GLOBALLY` macro which is enabled by
default, but can be overridden by build flags.

This is a step towards integrating Jakt and AK types.
2022-11-26 15:51:34 +01:00
Daniel Bertalan 4296425bd8 Everywhere: Remove redundant inequality comparison operators
C++20 can automatically synthesize `operator!=` from `operator==`, so
there is no point in writing such functions by hand if all they do is
call through to `operator==`.

This fixes a compile error with compilers that implement P2468 (Clang
16 currently). This paper restores the C++17 behavior that if both
`T::operator==(U)` and `T::operator!=(U)` exist, `U == T` won't be
rewritten in reverse to call `T::operator==(U)`. Removing `!=` operators
makes the rewriting possible again.
See https://reviews.llvm.org/D134529#3853062
2022-11-06 10:25:08 -07:00
demostanis 3e8b5ac920 AK+Everywhere: Turn bool keep_empty to an enum in split* functions 2022-10-24 23:29:18 +01:00