Commit graph

68 commits

Author SHA1 Message Date
Timothy Flynn cae184d7cf AK: Improve performance of StringUtils::find_last
The current algorithm is currently O(N^2) because we forward-search an
ever-increasing substring of the haystack. This implementation reduces
the search time of a 500,000-length string (where the desired needle is
at index 0) from 72 seconds to 2-3 milliseconds.
2024-01-04 11:28:03 -05:00
Timothy Flynn 9cab4958e6 AK: Convert a couple String-related declarations to east-const
Caught by clang-format-17. Note that clang-format-16 is fine with this
as well (it leaves the const placement alone), it just doesn't perform
the formatting to east-const itself.
2024-01-04 11:28:03 -05:00
Jesús "gsus" Lapastora 7578620f25 AK/StringUtils: Ensure needle positions don't overlap in replace
Previously, `replace` used `find_all` to find all of the positions to
replace. But `find_all` finds all the *overlapping* instances of the
needle, while `replace` assumed that the next position was always at
least `needle.length()` away from the last one. This led to crashes like
https://github.com/SerenityOS/jakt/issues/1159.
2023-12-17 12:00:48 -07:00
Ali Mohammad Pur 5e1499d104 Everywhere: Rename {Deprecated => Byte}String
This commit un-deprecates DeprecatedString, and repurposes it as a byte
string.
As the null state has already been removed, there are no other
particularly hairy blockers in repurposing this type as a byte string
(what it _really_ is).

This commit is auto-generated:
  $ xs=$(ack -l \bDeprecatedString\b\|deprecated_string AK Userland \
    Meta Ports Ladybird Tests Kernel)
  $ perl -pie 's/\bDeprecatedString\b/ByteString/g;
    s/deprecated_string/byte_string/g' $xs
  $ clang-format --style=file -i \
    $(git diff --name-only | grep \.cpp\|\.h)
  $ gn format $(git ls-files '*.gn' '*.gni')
2023-12-17 18:25:10 +03:30
hanaa12G 54e1470467 AK: Pass correct length to StringUtils::convert_to_floating_point()
Fixed the issue in StringUtils::convert_to_floating_point() where the
end pointer of the trimmed string was not being passed, causing the
function to consistently return 'None' when given strings with trailing
whitespaces.
2023-10-22 00:22:29 +02:00
Dan Klishch 3556c27d2d AK: Add StringView::count(char) 2023-08-18 08:58:51 +03:30
Liav A 7c0540a229 Everywhere: Move global Kernel pattern code to Kernel/Library directory
This has KString, KBuffer, DoubleBuffer, KBufferBuilder, IOWindow,
UserOrKernelBuffer and ScopedCritical classes being moved to the
Kernel/Library subdirectory.

Also, move the panic and assertions handling code to that directory.
2023-06-04 21:32:34 +02:00
Andreas Kling a504ac3e2a Everywhere: Rename equals_ignoring_case => equals_ignoring_ascii_case
Let's make it clear that these functions deal with ASCII case only.
2023-03-10 13:15:44 +01:00
Linus Groh 6e7459322d AK: Remove StringBuilder::build() in favor of to_deprecated_string()
Having an alias function that only wraps another one is silly, and
keeping the more obvious name should flush out more uses of deprecated
strings.
No behavior change.
2023-01-27 20:38:49 +00:00
Andrew Kaster 7ab37ee22c Everywhere: Remove string.h include from AK/Traits.h and resolve fallout
A lot of places were relying on AK/Traits.h to give it strnlen, memcmp,
memcpy and other related declarations.

In the quest to remove inclusion of LibC headers from Kernel files, deal
with all the fallout of this included-everywhere header including less
things.
2023-01-21 10:43:59 -07:00
Ben Wiederhake 65b420f996 Everywhere: Remove unused includes of AK/Memory.h
These instances were detected by searching for files that include
AK/Memory.h, but don't match the regex:

\\b(fast_u32_copy|fast_u32_fill|secure_zero|timing_safe_compare)\\b

This regex is pessimistic, so there might be more files that don't
actually use any memory function.

In theory, one might use LibCPP to detect things like this
automatically, but let's do this one step after another.
2023-01-02 20:27:20 -05:00
Florian Cramer af2ffcaba8 AK: Make StringUtils::matches() handle escaping correctly
Previously any backslash and the character following it were ignored.
This commit adds a fall through to match the character following the
backslash without checking whether it is "special".
2022-12-27 07:28:25 +03:30
Agustin Gianni 9a2ee5a9dd AK: Add DeprecatedString::find_last(StringView)
This adds the the method DeprecatedString::find_last() as wrapper for
StringUtils::find_last for the StringView type.
2022-12-20 11:24:05 +01:00
Timothy Flynn d28c9ba054 AK: Synchronize explicit instantiations of to_int and to_uint
1. Ensure long and long long are instantiated for to_int.
2. Ensure long and long long are not instantiated for to_uint.
2022-12-16 10:06:26 +01:00
Andreas Kling a3e82eaad3 AK: Introduce the new String, replacement for DeprecatedString
DeprecatedString (formerly String) has been with us since the start,
and it has served us well. However, it has a number of shortcomings
that I'd like to address.

Some of these issues are hard if not impossible to solve incrementally
inside of DeprecatedString, so instead of doing that, let's build a new
String class and then incrementally move over to it instead.

Problems in DeprecatedString:

- It assumes string allocation never fails. This makes it impossible
  to use in allocation-sensitive contexts, and is the reason we had to
  ban DeprecatedString from the kernel entirely.

- The awkward null state. DeprecatedString can be null. It's different
  from the empty state, although null strings are considered empty.
  All code is immediately nicer when using Optional<DeprecatedString>
  but DeprecatedString came before Optional, which is how we ended up
  like this.

- The encoding of the underlying data is ambiguous. For the most part,
  we use it as if it's always UTF-8, but there have been cases where
  we pass around strings in other encodings (e.g ISO8859-1)

- operator[] and length() are used to iterate over DeprecatedString one
  byte at a time. This is done all over the codebase, and will *not*
  give the right results unless the string is all ASCII.

How we solve these issues in the new String:

- Functions that may allocate now return ErrorOr<String> so that ENOMEM
  errors can be passed to the caller.

- String has no null state. Use Optional<String> when needed.

- String is always UTF-8. This is validated when constructing a String.
  We may need to add a bypass for this in the future, for cases where
  you have a known-good string, but for now: validate all the things!

- There is no operator[] or length(). You can get the underlying data
  with bytes(), but for iterating over code points, you should be using
  an UTF-8 iterator.

Furthermore, it has two nifty new features:

- String implements a small string optimization (SSO) for strings that
  can fit entirely within a pointer. This means up to 3 bytes on 32-bit
  platforms, and 7 bytes on 64-bit platforms. Such small strings will
  not be heap-allocated.

- String can create substrings without making a deep copy of the
  substring. Instead, the superstring gets +1 refcount from the
  substring, and it acts like a view into the superstring. To make
  substrings like this, use the substring_with_shared_superstring() API.

One caveat:

- String does not guarantee that the underlying data is null-terminated
  like DeprecatedString does today. While this was nifty in a handful of
  places where we were calling C functions, it did stand in the way of
  shared-superstring substrings.
2022-12-06 15:21:26 +01:00
Linus Groh 57dc179b1f Everywhere: Rename to_{string => deprecated_string}() where applicable
This will make it easier to support both string types at the same time
while we convert code, and tracking down remaining uses.

One big exception is Value::to_string() in LibJS, where the name is
dictated by the ToString AO.
2022-12-06 08:54:33 +01:00
Linus Groh 6e19ab2bbc AK+Everywhere: Rename String to DeprecatedString
We have a new, improved string type coming up in AK (OOM aware, no null
state), and while it's going to use UTF-8, the name UTF8String is a
mouthful - so let's free up the String name by renaming the existing
class.
Making the old one have an annoying name will hopefully also help with
quick adoption :^)
2022-12-06 08:54:33 +01:00
davidot 6fd8e96d53 AK: Add to_{double, float} convenience functions to all string types
These are guarded with #ifndef KERNEL, since doubles (and floats) are
not allowed in KERNEL mode.
In StringUtils there is convert_to_floating_point which does have a
template parameter incase you have a templated type.
2022-10-23 15:48:45 +02:00
Timothy Flynn b9dc0b7d1b AK: Do not append string bytes as code points when title-casing a string
By appending individual bytes as code points, we were "breaking apart"
multi-byte UTF-8 code points. This now behaves the same way as the
invert_case() helper in StringUtils.
2022-10-20 18:55:43 +02:00
Undefine 9718667bcf AK: Add StringView::find_last_not 2022-10-14 18:36:40 -06:00
demostanis aa788581f2 AK: Make StringUtils::matches() handle escaping 2022-10-14 13:37:29 +02:00
Sam Atkins a0d44026fc AK+Tests: Correct off-by-one error when right-trimming text
If the entire string you want to right-trim consists of characters you
want to remove, we previously would incorrectly leave the first
character there.

For example: `trim("aaaaa", "a")` would return "a" instead of "".

We can't use `i >= 0` in the loop since that would fail to detect
underflow, so instead we keep `i` in the range `size .. 1` and then
subtract 1 from it when reading the character.

Added some trim() tests while I was at it. (And to confirm that this was
the issue.)
2022-10-11 17:49:32 +02:00
sin-ack 3f3f45580a Everywhere: Add sv suffix to strings relying on StringView(char const*)
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).

No functional changes.
2022-07-12 23:11:35 +02:00
DexesTTP 7ceeb74535 AK: Use an enum instead of a bool for String::replace(all_occurences)
This commit has no behavior changes.

In particular, this does not fix any of the wrong uses of the previous
default parameter (which used to be 'false', meaning "only replace the
first occurence in the string"). It simply replaces the default uses by
String::replace(..., ReplaceMode::FirstOnly), leaving them incorrect.
2022-07-06 11:12:45 +02:00
huttongrabiel 8ffa860bc3 AK: Add invert_case() and invert_case(StringView)
In the given String, invert_case() swaps lowercase characters with
uppercase ones and vice versa.
2022-05-26 21:51:23 +01:00
Idan Horowitz 086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
Sam Atkins 7e98c8eaf6 AK+Tests: Fix StringUtils::contains() being confused by repeating text
Previously, case-insensitively searching the haystack "Go Go Back" for
the needle "Go Back" would return false:

1. Match the first three characters. "Go ".
2. Notice that 'G' and 'B' don't match.
3. Skip ahead 3 characters, plus 1 for the outer for-loop.
4. Now, the haystack is effectively "o Back", so the match fails.

Reducing the skip by 1 fixes this issue. I'm not 100% convinced this
fixes all cases, but I haven't been able to find any cases where it
doesn't work now. :^)
2022-03-18 23:51:56 +00:00
Idan Horowitz cec669a89a AK: Exclude StringUtils String APIs from the Kernel
These APIs are only used by userland, and String is OOM-infallible,
so let's just ifdef it out of the Kernel.
2022-02-16 22:21:37 +01:00
Xavier Defrang 9e97823ff8 AK: Add convert_to_uint_from_octal 2021-12-21 13:13:04 -08:00
Andreas Kling 8b1108e485 Everywhere: Pass AK::StringView by value 2021-11-11 01:27:46 +01:00
Peter Elliott accf4b338d AK: Expand to_int<i64> to to_int<long> and to_int<long long>
This change also applys to to_uint

On i686 u64 == long long but on x86_64 u64 == long. Therefor on either
arch, one of the instantiations is missed. This change makes sure that
all integer types have an instantiation.
2021-10-05 13:27:25 +03:30
Idan Horowitz b56b0ba689 AK: Eliminate avoidable strlen call in String::matches
We already know the length of these substrings, so there's no need to
check their lengths using strlen in the StringView(char*) constructor.

This patch also removes an accidental 1-byte OOB read that was left over
from when this method received null-terminated char pointers instead of
string views, as well removes the unnecessary lambda call (two of the
checks were impossible, and this was only called in one place, so we can
just inline it)
2021-09-28 00:31:45 +02:00
Idan Horowitz 6704961c82 AK: Replace the mutable String::replace API with an immutable version
This removes the awkward String::replace API which was the only String
API which mutated the String and replaces it with a new immutable
version that returns a new String with the replacements applied. This
also fixes a couple of UAFs that were caused by the use of this API.

As an optimization an equivalent StringView::replace API was also added
to remove an unnecessary String allocations in the format of:
`String { view }.replace(...);`
2021-09-11 20:36:43 +03:00
Idan Horowitz 6d2b003b6e AK: Make String::count not use strstr and take a StringView
This was needlessly copying StringView arguments, and was also using
strstr internally, which meant it was doing a bunch of unnecessary
strlen calls on it. This also moves the implementation to StringUtils
to allow API consistency between String and StringView.
2021-09-11 20:36:43 +03:00
Timothy Flynn 262e412634 AK: Implement method to convert a String/StringView to title case
This implementation preserves consecutive spaces in the orginal string.
2021-08-26 22:04:09 +01:00
Lenny Maiorani 97bd13264a Everywhere: Make use of container version of all_of
Problem:
- New `all_of` implementation takes the entire container so the user
  does not need to pass explicit begin/end iterators. This is unused
  except is in tests.

Solution:
- Make use of the new and more user-friendly version where possible.
2021-08-03 10:46:43 +02:00
Max Wipfli f0fcbb7751 AK: Replace usages of ctype.h with CharacterTypes.h
This replaces all remaining usages of ctype.h in AK with
CharacterTypes.h.
2021-07-07 14:05:56 +02:00
Max Wipfli 9cc35d1ba3 AK: Implement String::find_any_of() and StringView::find_any_of()
This implements StringUtils::find_any_of() and uses it in
String::find_any_of() and StringView::find_any_of(). All uses of
find_{first,last}_of have been replaced with find_any_of(), find() or
find_last(). find_{first,last}_of have subsequently been removed.
2021-07-02 21:54:21 +02:00
Max Wipfli d7a104c27c AK: Implement StringView::find_all()
This implements the StringView::find_all() method by re-implemeting the
current method existing for String in StringUtils, and using that
implementation for both String and StringView.

The rewrite uses memmem() instead of strstr(), so the String::find_all()
argument type has been changed from String to StringView, as the null
byte is no longer required.
2021-07-02 21:54:21 +02:00
Max Wipfli 56253bf389 AK: Reimplement StringView::find methods in StringUtils
This patch reimplements the StringView::find methods in StringUtils, so
they can also be used by String. The methods now also take an optional
start parameter, which moves their API in line with String's respective
methods.

This also implements a StringView::find_ast(char) method, which is
currently functionally equivalent to find_last_of(char). This is because
find_last_of(char) will be removed in a further commit.
2021-07-02 21:54:21 +02:00
sin-ack 3abcfcc178 AK: Add a way to disable the trimming of whitespace in to_*int
This behavior might not always be desirable, and so this patch adds a
way to disable it.
2021-06-18 19:18:15 +01:00
Max Wipfli 0e4f7aa8e8 AK: Add trim() method to String, StringView and StringUtils
The methods added make it possible to use the trim mechanism with
specified characters, unlike trim_whitespace(), which uses predefined
characters.
2021-06-01 09:28:05 +02:00
Brian Gianforcaro 1682f0b760 Everything: Move to SPDX license identifiers in all files.
SPDX License Identifiers are a more compact / standardized
way of representing file license information.

See: https://spdx.dev/resources/use/#identifiers

This was done with the `ambr` search and replace tool.

 ambr --no-parent-ignore --key-from-file --rep-from-file key.txt rep.txt *
2021-04-22 11:22:27 +02:00
Andreas Kling 873da38d0e AK: Remove String-from-StringView optimization
We had an unusual optimization in AK::StringView where constructing
a StringView from a String would cause it to remember the internal
StringImpl pointer of the String.

This was used to make constructing a String from a StringView fast
and copy-free.

I tried removing this optimization and indeed we started seeing a
ton of allocation traffic. However, all of it was due to a silly
pattern where functions would take a StringView and then go on
to create a String from it.

I've gone through most of the code and updated those functions to
simply take a String directly instead, which now makes this
optimization unnecessary, and indeed a source of bloat instead.

So, let's get rid of it and make StringView a little smaller. :^)
2021-04-17 01:27:31 +02:00
Andreas Kling 1f684c8123 AK: Implement case-insensitive StringUtils::matches() without allocs
Previously this would create new to_lowercase()'d strings from the
needle and the haystack. This generated a huge amount of malloc
traffic in some programs.
2021-04-17 01:27:30 +02:00
Linus Groh e265054c12 Everywhere: Remove a bunch of redundant 'AK::' namespace prefixes
This is basically just for consistency, it's quite strange to see
multiple AK container types next to each other, some with and some
without the namespace prefix - we're 'using AK::Foo;' a lot and should
leverage that. :^)
2021-02-26 16:59:56 +01:00
Linus Groh 4fafe14691 AK: Add String{,Utils}::to_snakecase()
This is an improved version of WrapperGenerator's snake_name(), which
seems like the kind of thing that could be useful elsewhere but would
end up getting duplicated - so let's add this to AK::String instead,
like to_{lowercase,uppercase}().
2021-02-21 19:47:47 +01:00
AnotherTest 39442e6d4f AK: Add String{View,}::find(StringView)
I personally mistook `find_first_of(StringView)` to be analogous to this
so let's add a `find()` method that actually searches the string.
2021-01-12 23:36:20 +01:00
AnotherTest 7029a8f605 AK: Specialise convert_to_uint<T> and to_uint<T> for 'long' variants 2021-01-11 21:09:36 +01:00
AnotherTest f3ecea1fb3 AK: Add String{,View}::is_whitespace()
+Tests!
2021-01-03 10:47:29 +01:00