Commit graph

144 commits

Author SHA1 Message Date
Timothy Flynn ef275e25b8 AK: Reduce String's allocated data by one byte
This was copied from allocation_size_for_stringimpl, which had to ensure
the string is null-terminated. String makes no such guarantee.
2023-01-22 20:27:52 +00:00
Timothy Flynn 8aca8e82cb AK: Change String's default constructor to be constant
This allows creating expressions such as:

    constexpr Array<String, 10> {};
2023-01-22 01:03:13 +00:00
martinfalisse aec2dadfdd AK: Add split() for String 2023-01-21 14:35:00 +01:00
Timothy Flynn d48266a420 AK: Support creating known short string literals at compile time
In cases where we know a string literal will fit in the short string
storage, we can do so at compile time without needing to handle error
propagation. If the provided string literal is too long, a compilation
error will be emitted due to the failed VERIFY statement being a non-
constant expression.
2023-01-20 14:24:12 -05:00
Timothy Flynn cf0899f440 AK: Add String::contains 2023-01-15 01:00:20 +00:00
Timothy Flynn 9db9b2f9be AK: Add a somewhat naive implementation of String::reverse
This will reverse the String's code points (i.e. not just its bytes),
but is not aware of grapheme clusters.
2023-01-15 01:00:20 +00:00
Timothy Flynn 1d4f287582 AK: Implement FlyString for the new String class
This implements a FlyString that will de-duplicate String instances. The
FlyString will store the raw encoded data of the String instance: If the
String is a short string, FlyString holds the String::ShortString bytes;
otherwise FlyString holds a pointer to the Detail::StringData.

FlyString itself does not know about String's storage or how to refcount
its Detail::StringData. It defers to String to implement these details.
2023-01-12 11:23:58 +01:00
Ben Wiederhake 65b420f996 Everywhere: Remove unused includes of AK/Memory.h
These instances were detected by searching for files that include
AK/Memory.h, but don't match the regex:

\\b(fast_u32_copy|fast_u32_fill|secure_zero|timing_safe_compare)\\b

This regex is pessimistic, so there might be more files that don't
actually use any memory function.

In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
2023-01-02 20:27:20 -05:00
kleines Filmröllchen f502e24bd7 AK: Change the moved-from String state to the empty short string
The previous moved-from state was the null string. This violates both
our invariant that String is never null, and also the C++ contract that
the moved-from state must be valid but unspecified. The empty short
string state is of course valid, so it satisfies both invariants. It
also allows us to remove any extra checks for the null state.

The reason this change is made is primarily because swap() requires
moved-from objects to be reassignable (C++ allows this). Because the
move assignment of String would not check the null state, it crashed
trying to increment the data reference count (nullptr signals a
non-short string). This meant that e.g. quick_sort'ing String would
crash immediately.
2022-12-11 16:05:23 +00:00
Maciej 58f5deba70 AK: Unref old m_data in String's move assignment
We were overridding the data pointer without unreffing it,
causing a memory leak when assigning a String.
2022-12-09 00:02:53 +01:00
Andreas Kling a3e82eaad3 AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.

Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.

Problems in DeprecatedString:

- It assumes string allocation never fails. This makes it impossible
  to use in allocation-sensitive contexts, and is the reason we had to
  ban DeprecatedString from the kernel entirely.

- The awkward null state. DeprecatedString can be null. It's different
  from the empty state, although null strings are considered empty.
  All code is immediately nicer when using Optional<DeprecatedString>
  but DeprecatedString came before Optional, which is how we ended up
  like this.

- The encoding of the underlying data is ambiguous. For the most part,
  we use it as if it's always UTF-8, but there have been cases where
  we pass around strings in other encodings (e.g ISO8859-1)

- operator[] and length() are used to iterate over DeprecatedString one
  byte at a time. This is done all over the codebase, and will *not*
  give the right results unless the string is all ASCII.

How we solve these issues in the new String:

- Functions that may allocate now return ErrorOr<String> so that ENOMEM
  errors can be passed to the caller.

- String has no null state. Use Optional<String> when needed.

- String is always UTF-8. This is validated when constructing a String.
  We may need to add a bypass for this in the future, for cases where
  you have a known-good string, but for now: validate all the things!

- There is no operator[] or length(). You can get the underlying data
  with bytes(), but for iterating over code points, you should be using
  an UTF-8 iterator.

Furthermore, it has two nifty new features:

- String implements a small string optimization (SSO) for strings that
  can fit entirely within a pointer. This means up to 3 bytes on 32-bit
  platforms, and 7 bytes on 64-bit platforms. Such small strings will
  not be heap-allocated.

- String can create substrings without making a deep copy of the
  substring. Instead, the superstring gets +1 refcount from the
  substring, and it acts like a view into the superstring. To make
  substrings like this, use the substring_with_shared_superstring() API.

One caveat:

- String does not guarantee that the underlying data is null-terminated
  like DeprecatedString does today. While this was nifty in a handful of
  places where we were calling C functions, it did stand in the way of
  shared-superstring substrings.
2022-12-06 15:21:26 +01:00
Linus Groh 6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
demostanis 7c33f8f7df AK: Add SplitBehavior::KeepTrailingSeparator with tests 2022-10-24 23:29:18 +01:00
demostanis 3e8b5ac920 AK+Everywhere: Turn bool keep_empty to an enum in split* functions 2022-10-24 23:29:18 +01:00
davidot 6fd8e96d53 AK: Add to_{double, float} convenience functions to all string types
These are guarded with #ifndef KERNEL, since doubles (and floats) are
not allowed in KERNEL mode.
In StringUtils there is convert_to_floating_point which does have a
template parameter incase you have a templated type.
2022-10-23 15:48:45 +02:00
sin-ack 3f8060d859 AK: Remove String <-> char const* comparison operators
During the removal of StringView(char const*), all users of these
functions were removed, and they are of dubious value (relying on
implicit StringView conversion).
2022-07-12 23:11:35 +02:00
sin-ack 3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00
huttongrabiel 8ffa860bc3 AK: Add invert_case() and invert_case(StringView)
In the given String, invert_case() swaps lowercase characters with
uppercase ones and vice versa.
2022-05-26 21:51:23 +01:00
Ali Mohammad Pur dd370fcdd1 AK: Explicitly instantiate String::to_uint<unsigned long{, long}>()
Instead of just to_uint<u64>().
2022-04-20 00:15:23 +04:30
Idan Horowitz 086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
Andreas Kling dd7eb3d6d8 AK: Add String::split_view(Function<bool(char)>)
This allows you to split around a custom separator, and enables
expressive code like this:

    string.split_view(is_ascii_space);
2022-02-25 19:38:31 +01:00
Daniel Bertalan 1a4aad9b7e AK: Implement String's comparison operators in terms of StringView's 2022-01-29 23:08:27 +01:00
Sam Atkins 45cf40653a Everywhere: Convert ByteBuffer factory methods from Optional -> ErrorOr
Apologies for the enormous commit, but I don't see a way to split this
up nicely. In the vast majority of cases it's a simple change. A few
extra places can use TRY instead of manual error checking though. :^)
2022-01-24 22:36:09 +01:00
Matt Jacobson 47e8d58553 AK: Fix logic in String::operator>(const String&)
Null strings should not compare greater than non-null strings.

Add tests for >, <, >=, and <= comparison involving null strings.
2022-01-16 11:08:23 +01:00
Andreas Kling 216e21a1fa AK: Convert AK::Format formatting helpers to returning ErrorOr<void>
This isn't a complete conversion to ErrorOr<void>, but a good chunk.
The end goal here is to propagate buffer allocation failures to the
caller, and allow the use of TRY() with formatting functions.
2021-11-17 00:21:13 +01:00
Andreas Kling 8b1108e485 Everywhere: Pass AK::StringView by value 2021-11-11 01:27:46 +01:00
Andreas Kling 5f7d008791 AK+Everywhere: Stop including Vector.h from StringView.h
Preparation for using Error.h from Vector.h. This required moving some
things out of line.
2021-11-10 21:58:58 +01:00
Peter Elliott d28459fb11 AK: Escape '"' in escape_html_entities 2021-09-12 12:17:16 +02:00
Idan Horowitz 6704961c82 AK: Replace the mutable String::replace API with an immutable version
This removes the awkward String::replace API which was the only String
API which mutated the String and replaces it with a new immutable
version that returns a new String with the replacements applied. This
also fixes a couple of UAFs that were caused by the use of this API.

As an optimization an equivalent StringView::replace API was also added
to remove an unnecessary String allocations in the format of:
`String { view }.replace(...);`
2021-09-11 20:36:43 +03:00
Idan Horowitz 6d2b003b6e AK: Make String::count not use strstr and take a StringView
This was needlessly copying StringView arguments, and was also using
strstr internally, which meant it was doing a bunch of unnecessary
strlen calls on it. This also moves the implementation to StringUtils
to allow API consistency between String and StringView.
2021-09-11 20:36:43 +03:00
Ali Mohammad Pur 97e97bccab Everywhere: Make ByteBuffer::{create_*,copy}() OOM-safe 2021-09-06 01:53:26 +02:00
Brian Gianforcaro fee2a03ba9 AK: Pass AK::Format TypeErasedFormatParams by reference in AK::String
This silences a overeager warning in sonar cloud, warning that
slicing could occur with `VariadicFormatParams` which derives from
`TypeErasedFormatParams`.

Reference:
https://sonarcloud.io/project/issues?id=SerenityOS_serenity&issues=AXuVPBW3k92xXUF3qXTE&open=AXuVPBW3k92xXUF3qXTE

This is a continuation of f0b3aa0331.
2021-09-01 18:06:14 +02:00
Timothy Flynn 262e412634 AK: Implement method to convert a String/StringView to title case
This implementation preserves consecutive spaces in the orginal string.
2021-08-26 22:04:09 +01:00
Jean-Baptiste Boric 7a9d05c24c AK: Add contains(char) method to String 2021-08-12 00:41:13 +02:00
Tobias Christiansen 87033ce7d1 AK: Add generation of roman numerals to AK::String
We now can generate roman numbers using String::roman_number_from()
similar to String::bijective_base_from().
2021-07-04 22:17:03 +02:00
Max Wipfli 17eddf3ac4 AK: Add input bounds checking to String::substring()
This checks for overflow in String::substring(). It also rearranges some
declarations in the header.
2021-07-02 21:54:21 +02:00
Max Wipfli 268d81a56c AK: Add String::find_last() and inline String::find() methods
This adds the String::find_last() as wrapper for StringUtils::find_last,
which is another step in harmonizing the String and StringView APIs
where possible.

This also inlines the find() methods, as they are simple wrappers around
StringUtils functions without any additional logic.
2021-07-02 21:54:21 +02:00
Max Wipfli d7a104c27c AK: Implement StringView::find_all()
This implements the StringView::find_all() method by re-implemeting the
current method existing for String in StringUtils, and using that
implementation for both String and StringView.

The rewrite uses memmem() instead of strstr(), so the String::find_all()
argument type has been changed from String to StringView, as the null
byte is no longer required.
2021-07-02 21:54:21 +02:00
Max Wipfli 56253bf389 AK: Reimplement StringView::find methods in StringUtils
This patch reimplements the StringView::find methods in StringUtils, so
they can also be used by String. The methods now also take an optional
start parameter, which moves their API in line with String's respective
methods.

This also implements a StringView::find_ast(char) method, which is
currently functionally equivalent to find_last_of(char). This is because
find_last_of(char) will be removed in a further commit.
2021-07-02 21:54:21 +02:00
Idan Horowitz a768131720 AK: Ensure StringBuilder capacity in String::reverse 2021-06-29 16:55:54 +01:00
sin-ack 3abcfcc178 AK: Add a way to disable the trimming of whitespace in to_*int
This behavior might not always be desirable, and so this patch adds a
way to disable it.
2021-06-18 19:18:15 +01:00
Gunnar Beutner a4f320c76b AK: Allow inlining more string functions 2021-06-03 08:06:51 +02:00
Matthew Olsson 777c232e16 AK: Add String::repeated(StringView, size_t count) 2021-05-25 00:24:09 +04:30
Andreas Kling de395a3df2 AK+Everywhere: Consolidate String::index_of() and String::find()
We had two functions for doing mostly the same thing. Combine both
of them into String::find() and use that everywhere.

Also add some tests to cover basic behavior.
2021-05-24 11:59:18 +02:00
Maciej Zygmanowski 80077cea86 AK: Add String::find_all() and String::count() 2021-05-19 20:51:51 +01:00
Tobias Christiansen 4016e04061 AK: Move bijective-base-conversion into AK/String
This allows everybody to create a String version of their number
in a arbitrary bijective base. Bijective base meaning that the mapping
doesn't have a 0. In the usual mapping to the alphabet the follower
after 'Z' is 'AA'.
The mapping using the (uppercase) alphabet is used as a standard but
can be overridden specifying 'base' and 'map'.

The code was directly yanked from the Spreadsheet.
2021-05-01 01:19:40 +02:00
Brian Gianforcaro 1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Andreas Kling edf0b14e23 AK: Remove String::format()
There are no more clients of this function, everyone has been converted
to String::formatted().
2021-04-21 23:49:03 +02:00
Andreas Kling 873da38d0e AK: Remove String-from-StringView optimization
We had an unusual optimization in AK::StringView where constructing
a StringView from a String would cause it to remember the internal
StringImpl pointer of the String.

This was used to make constructing a String from a StringView fast
and copy-free.

I tried removing this optimization and indeed we started seeing a
ton of allocation traffic. However, all of it was due to a silly
pattern where functions would take a StringView and then go on
to create a String from it.

I've gone through most of the code and updated those functions to
simply take a String directly instead, which now makes this
optimization unnecessary, and indeed a source of bloat instead.

So, let's get rid of it and make StringView a little smaller. :^)
2021-04-17 01:27:31 +02:00
Andreas Kling 5d180d1f99 Everywhere: Rename ASSERT => VERIFY
(...and ASSERT_NOT_REACHED => VERIFY_NOT_REACHED)

Since all of these checks are done in release builds as well,
let's rename them to VERIFY to prevent confusion, as everyone is
used to assertions being compiled out in release.

We can introduce a new ASSERT macro that is specifically for debug
checks, but I'm doing this wholesale conversion first since we've
accumulated thousands of these already, and it's not immediately
obvious which ones are suitable for ASSERT.
2021-02-23 20:56:54 +01:00