Commit graph

98 commits

Author SHA1 Message Date
Andreas Kling 6724f840cd AK: Early return from empty hash table lookups to avoid hashing
When calling get() or find() on an empty HashTable or HashMap, we can
avoid hashing the sought-after key.
2024-03-16 14:27:59 +01:00
kleines Filmröllchen 9a026fc8d5 AK: Implement SipHash as the default hash algorithm for most use cases
SipHash is highly HashDoS-resistent, initialized with a random seed at
startup (i.e. non-deterministic) and usable for security-critical use
cases with large enough parameters. We just use it because it's
reasonably secure with parameters 1-3 while having excellent properties
and not being significantly slower than before.
2023-10-01 11:06:36 +03:30
Daniel Bertalan 4d2af7c3d6 AK: Implement reverse iterators for OrderedHashTable 2023-09-24 23:36:43 +02:00
Karol Kosek e575ee4462 AK+Kernel: Unify Traits<T>::equals()'s argument order on different types
There was a small mishmash of argument order, as seen on the table:

                 | Traits<T>::equals(U, T) | Traits<T>::equals(T, U)
   ============= | ======================= | =======================
   uses equals() | HashMap                 | Vector, HashTable
defines equals() | *String[^1]             | ByteBuffer

[^1]: String, DeprecatedString, their Fly-type equivalents and KString.

This mostly meant that you couldn't use a StringView for finding a value
in Vector<String>.

I'm changing the order of arguments to make the trait type itself first
(`Traits<T>::equals(T, U)`), as I think it's more expected and makes us
more consistent with the rest of the functions that put the stored type
first (like StringUtils functions and binary_serach). I've also renamed
the variable name "other" in find functions to "entry" to give more
importance to the value.

With this change, each of the following lines will now compile
successfully:

    Vector<String>().contains_slow("WHF!"sv);
    HashTable<String>().contains("WHF!"sv);
    HashMap<ByteBuffer, int>().contains("WHF!"sv.bytes());
2023-08-23 20:21:09 +02:00
Ben Wiederhake 36ff6187f6 Everywhere: Change spelling of 'behaviour' to 'behavior'
"The official project language is American English […]."
5d2e915623/CONTRIBUTING.md?plain=1#L30

Here's a short statistic of the occurrences of the word "behavio(u)r":

$ git grep -IPioh 'behaviou?r' | sort | uniq -c | sort -n
      2 BEHAVIOR
     24 Behaviour
     32 behaviour
    407 Behavior
    992 behavior

Therefore, it is clear that "behaviour" (56 occurrences) should be
regarded a typo, and "behavior" (1401 occurrences) should be preferred.

Note that The occurrences in LibJS are intentionally NOT changed,
because there are taken verbatim from the specification. Hence:

$ git grep -IPioh 'behaviou?r' | sort | uniq -c | sort -n
      2 BEHAVIOR
     10 behaviour
     24 Behaviour
    407 Behavior
   1014 behavior
2023-05-07 01:05:09 +02:00
Ben Wiederhake ee47c0275e Everywhere: Run spellcheck on all documentation 2023-05-07 01:05:09 +02:00
Aliaksandr Kalenik 4c6564e3c1 AK: Add values() method in HashTable
Add HashTable::values() method that returns all values.
2023-04-28 18:11:44 +02:00
Jelle Raaijmakers 954d660094 AK: Clear OrderedHashTable previous/next pointers on removal
With Clang, the previous/next pointers in buckets of an
`OrderedHashTable` are not cleared when a bucket is being shifted up as
a result of a removed bucket. As a result, an unfortunate pointer mixup
could lead to an infinite loop in the `HashTable` iterator, which was
exposed in `HashMap::keys()`.

Co-authored-by: Luke Wilde <lukew@serenityos.org>
2023-03-15 21:43:52 +01:00
Hediadyoin1 fd8c54d720 AK: Add take_first to HashTable and rename pop to take_last
This naming scheme matches Vector.

This also changes `take_last` to move the value it takes, and delete by
known pointer, avoiding a full lookup and potential copies.
2023-02-21 22:13:06 +01:00
Hediadyoin1 93945062a7 AK: Update HashTables head and tail when shifting during deletion
Otherwise we end up with invalid pointers to them, breaking iteration.
2023-02-21 22:13:06 +01:00
Jelle Raaijmakers c08d137fcd AK: Reimplement HashTable with smart linear probing
Instead of rehashing on collisions, we use Robin Hood hashing: a simple
linear probe where we keep track of the distance between the bucket and
its ideal position. On insertion, we allow a new bucket to "steal" the
position of "rich" buckets (those near their ideal position) and move
them further down.

On removal, we shift buckets back up into the freed slot, decrementing
their distance while doing so.

This behavior automatically optimizes the number of required probes for
any value, and removes the need for periodic rehashing (except when
expanding the capacity).
2023-02-17 22:29:51 -07:00
Timothy Flynn 4f5353cbb8 AK: Rename double_hash to rehash_for_collision
The name is currently quite confusing as it indicates it hashes doubles.
2023-01-21 10:36:14 +01:00
Eli Youngs a2024cfb69 AK: Support popping an arbitrary element from a HashTable 2022-12-16 10:41:56 -07:00
Moustafa Raafat b8f1e1bed2 Everywhere: Remove unnecessary AK and Detail namespace scoping 2022-12-09 11:25:30 +00:00
Linus Groh d26aabff04 Everywhere: Run clang-format 2022-12-03 23:52:23 +00:00
Andreas Kling ae3ffdd521 AK: Make it possible to not using AK classes into the global namespace
This patch adds the `USING_AK_GLOBALLY` macro which is enabled by
default, but can be overridden by build flags.

This is a step towards integrating Jakt and AK types.
2022-11-26 15:51:34 +01:00
Zaggy1024 a1300d3797 AK: Don't crash in HashTable::clear_with_capacity on an empty table
When calling clear_with_capacity on an empty HashTable/HashMap, a null
deref would occur when trying to memset() m_buckets. Checking that it
has capacity before clearing fixes the issue.
2022-11-11 00:44:04 -07:00
Hendiadyoin1 5bf84a5b0e AK: Zero previous pointer *after* fixing the insertion list in HashTable 2022-06-23 20:25:12 +03:00
Idan Horowitz eb02425ef9 AK: Clear the previous and next pointers of deleted HashTable buckets
Usually the values of the previous and next pointers of deleted buckets
are never used, as they're not part of the main ordered bucket chain,
but if an in-place rehashing is done, which results in the bucket being
turned into a free bucket, the stale pointers will remain, at which
point any item that is inserted into said free-bucket will have either
a stale previous pointer if the HashTable was empty on insertion, or a
stale next pointer, resulting in undefined behaviour.

This commit also includes a new HashMap test that reproduces this issue
2022-06-22 21:53:13 +02:00
Vitaly Dyachkov a0a4d169f4 AK+LibGUI: Pass predicate to *_matching() methods by const reference 2022-05-08 17:02:00 +02:00
Idan Horowitz 086969277e Everywhere: Run clang-format 2022-04-01 21:24:45 +01:00
kleines Filmröllchen 09a12247fb AK: Use bucket states with special bit patterns in HashTable
This simplifies some of the bucket state handling code, as there's now
an easy way of checking the basic category of bucket state.
2022-03-31 12:06:13 +02:00
kleines Filmröllchen 49d29c8298 AK: Rehash HashTable in-place instead of shrinking
As seen on TV, HashTable can get "thrashed", i.e. it has a bunch of
deleted buckets that count towards the load factor. This means that hash
tables which are large enough for their contents need to be resized.
This was fixed in 9d8da16 with a workaround that shrinks the HashTable
back down in these cases, as after the resize and re-hash the load
factor is very low again. However, that's not a good solution. If you
insert and remove repeatedly around a size boundary, you might get
frequent resizes, which involve frequent re-allocations.

The new solution is an in-place rehashing algorithm that I came up with.
(Do complain to me, I'm at fault.) Basically, it iterates the buckets
and re-hashes the used buckets while marking the deleted slots empty.
The issue arises with collisions in the re-hash. For this reason, there
are two kinds of used buckets during the re-hashing: the normal "used"
buckets, which are old and are treated as free space, and the
"re-hashed" buckets, which are new and treated as used space, i.e. they
trigger probing. Therefore, the procedure for relocating a bucket's
contents is as follows:
- Locate the "real" bucket of the contents with the hash. That bucket is
  the starting point for the target bucket, and the current (old) bucket
  is the bucket we want to move.
- While we still need to move the bucket:
  - If we're the target, something strange happened last iteration or we
    just re-hashed to the same location. We're done.
  - If the target is empty or deleted, just move the bucket. We're done.
  - If the target is a re-hashed full bucket, we probe by double-hashing
    our hash as usual. Henceforth, we move our target for the next
    iteration.
  - If the target is an old full bucket, we swap the target and to-move
buckets. Therefore, the bucket to move is a the correct location and the
former target, which still needs to find a new place, is now in the
bucket to move. So we can just continue with the loop; the target is
re-obtained from the bucket to move. This happens for each and every
bucket, though some buckets are "coincidentally" moved before their
point of iteration is reached. Either way, this guarantees full in-place
movement (even without stack storage) and therefore space complexity of
O(1). Time complexity is amortized O(2n) asssuming a good hashing
function.

This leads to a performance improvement of ~30% on the benchmark
introduced with the last commit.

Co-authored-by: Hendiadyoin1 <leon.a@serenityos.org>
2022-03-31 12:06:13 +02:00
kleines Filmröllchen bcb8937898 AK: Merge HashTable bucket state into one enum
The hash table buckets had three different state booleans that are in
fact exclusive. In preparation for further states, this commit
consolidates them into one enum. This has the added benefit on not
relying on the compiler's boolean packing anymore; we definitely now
only need one byte for the bucket state.
2022-03-31 12:06:13 +02:00
Daniel Bertalan e3eb68dd58 AK+Kernel: Avoid double memory clearing of HashTable buckets
Since the allocated memory is going to be zeroed immediately anyway,
let's avoid redundantly scrubbing it with MALLOC_SCRUB_BYTE just before
that.

The latest versions of gcc and Clang can automatically do this malloc +
memset -> calloc optimization, but I've seen a couple of places where it
failed to be done.

This commit also adds a naive kcalloc function to the kernel that
doesn't (yet) eliminate the redundancy like the userland does.
2022-03-15 11:56:46 +01:00
Andreas Kling 9d8da1697e AK: Automatically shrink HashTable when removing entries
If the utilization of a HashTable (size vs capacity) goes below 20%,
we'll now shrink the table down to capacity = (size * 2).

This fixes an issue where tables would grow infinitely when inserting
and removing keys repeatedly. Basically, we would accumulate deleted
buckets with nothing reclaiming them, and eventually deciding that we
needed to grow the table (because we grow if used+deleted > limit!)

I found this because HashTable iteration was taking a suspicious amount
of time in Core::EventLoop::get_next_timer_expiration(). Turns out the
timer table kept growing in capacity over time. That made iteration
slower and slower since HashTable iterators visit every bucket.
2022-03-07 00:08:22 +01:00
Andreas Kling eb829924da AK: Remove return value from HashTable::remove() and HashMap::remove()
This was only used by remove_all_matching(), where it's no longer used.
2022-03-07 00:08:22 +01:00
Andreas Kling 623bdd8b6a AK: Simplify HashTable::remove_all_matching()
Just walk the table from start to finish, deleting buckets as we go.
This removes the need for remove() to return an iterator, which is
preventing me from implementing hash table auto-shrinking.
2022-03-07 00:08:22 +01:00
Idan Horowitz 9b0d90a71d AK: Support using custom comparison operations for hash compatible keys 2022-01-29 23:01:23 +02:00
James Puleo 10b25d2a57 AK: Implement HashTable::try_ensure_capacity, as used in HashMap
This was used in `HashMap::try_ensure_capacity`, but was missing from
`HashTable`s implementation. No one had used
`HashMap::try_ensure_capacity` before so it went unnoticed!
2022-01-25 09:17:22 +01:00
Andreas Kling 5279a04c78 AK: Make Hash{Map,Table}::remove_all_matching() return removal success
These functions now return whether one or more entries were removed.
2022-01-05 18:57:14 +01:00
Andreas Kling 54cf42fac1 AK: Add HashTable::remove_all_matching(predicate)
This removes all matching entries from a table in a single pass.
2022-01-05 18:57:14 +01:00
Hendiadyoin1 c673b7220a AK: Enable fast path for removal by hash-compatible key in HashMap/Table 2021-12-15 23:35:14 -08:00
Hendiadyoin1 d50360f5dd AK: Allow hash-compatible key types in Hash[Table|Map] lookup
This will allow us to avoid some potentially expensive type conversion
during lookup, like form String to StringView, which would allocate
memory otherwise.
2021-12-15 13:09:49 +03:30
Andrew Kaster 762b92c650 AK: Resolve clang-tidy readability-qualified-auto warnings
... In files included by Kernel/Process.cpp and Kernel/Thread.cpp
2021-11-14 22:52:35 +01:00
Andrew Kaster 22feb9d47b AK: Resolve clang-tidy readability-bool-conversion warnings
... In files included by Kernel/Process.cpp and Kernel/Thread.cpp
2021-11-14 22:52:35 +01:00
Hendiadyoin1 f76241914c AK: Allow to clear HashTables/Maps with capacity 2021-11-11 09:19:17 +01:00
Andreas Kling 9d1f238450 AK: Make HashTable and HashMap try_* functions return ErrorOr<T>
This allows us to use TRY() and MUST() with them.
2021-11-11 01:27:46 +01:00
Ben Wiederhake f8d7b4daea AK: Add missing headers
Example failure:

IDAllocator.h only pulls in AK/Hashtable.h, so any compilation unit that
includes AK/IDAllocator.h without including AK/Traits.h before it used
to be doomed to fail with the cryptic error message "In instantiation of
'AK::HashTable<T, TraitsForT, IsOrdered>::Iterator AK::HashTable<T,
TraitsForT, IsOrdered>::find(const T&) [with T = int; TraitsForT =
AK::Traits: incomplete type 'AK::Traits<int>' used in nested name
specifier".
2021-10-06 23:52:40 +01:00
Hendiadyoin1 93cf01ad7d AK: Mark HashTable::size_in_bytes() as constexpr 2021-09-10 14:33:53 +00:00
Hediadyoin1 1aa527f5b6 AK: Add OOM safe interface to HashTable/Map
This adds a new HashSetResult only returned by try_set, to signal
allocation failure during setting.
2021-09-10 14:33:53 +00:00
Andreas Kling 6ad427993a Everywhere: Behaviour => Behavior 2021-09-07 13:53:14 +02:00
Andreas Kling a940a8bf37 AK: Remove unused private HashTable::lookup_for_reading() 2021-07-21 18:18:51 +02:00
Andreas Kling f65b039c44 AK: Sprinkle [[nodiscard]] on HashMap and HashTable 2021-07-21 18:18:29 +02:00
ngc6302h 213e2af281 HashTable: Rename finders with a more accurate and self-descripting name 2021-07-13 17:31:00 +02:00
Andreas Kling 3aabace9f5 AK: Use kfree_sized() in AK::HashTable 2021-07-11 14:14:51 +02:00
Hediadyoin1 4a81c79909 AK: Add Ordering support to HashTable and HashMap
Adds a IsOrdered flag to Hashtable and HashMap, which allows iteration
in insertion order
2021-06-15 22:16:55 +02:00
Idan Horowitz 71c54198fa AK: Allow changing the HashTable behaviour for sets on existing entries
Specifically, replacing the existing entry or just keeping it and
canceling the set.
2021-06-09 11:48:04 +01:00
Andreas Kling c584421592 AK: Make HashTable::operator=(HashTable&&) clear the moved-from table
This is consistent with how other AK containers behave when moved from.
2021-05-30 14:34:32 +02:00
Gunnar Beutner f89e8fb71a AK+LibC: Implement malloc_good_size() and use it for Vector/HashTable
This implements the macOS API malloc_good_size() which returns the
true allocation size for a given requested allocation size. This
allows us to make use of all the available memory in a malloc chunk.

For example, for a malloc request of 35 bytes our malloc would
internally use a chunk of size 64, however the remaining 29 bytes
would be unused.

Knowing the true allocation size allows us to request more usable
memory that would otherwise be wasted and make that available for
Vector, HashTable and potentially other callers in the future.
2021-05-15 16:30:14 +02:00