serenity

mirror of https://github.com/SerenityOS/serenity synced 2024-07-22 10:36:24 +00:00

Author	SHA1	Message	Date
Andreas Kling	6724f840cd	AK: Early return from empty hash table lookups to avoid hashing When calling get() or find() on an empty HashTable or HashMap, we can avoid hashing the sought-after key.	2024-03-16 14:27:59 +01:00
kleines Filmröllchen	9a026fc8d5	AK: Implement SipHash as the default hash algorithm for most use cases SipHash is highly HashDoS-resistent, initialized with a random seed at startup (i.e. non-deterministic) and usable for security-critical use cases with large enough parameters. We just use it because it's reasonably secure with parameters 1-3 while having excellent properties and not being significantly slower than before.	2023-10-01 11:06:36 +03:30
Daniel Bertalan	4d2af7c3d6	AK: Implement reverse iterators for `OrderedHashTable`	2023-09-24 23:36:43 +02:00
Karol Kosek	e575ee4462	AK+Kernel: Unify Traits<T>::equals()'s argument order on different types There was a small mishmash of argument order, as seen on the table: \| Traits<T>::equals(U, T) \| Traits<T>::equals(T, U) ============= \| ======================= \| ======================= uses equals() \| HashMap \| Vector, HashTable defines equals() \| *String[^1] \| ByteBuffer [^1]: String, DeprecatedString, their Fly-type equivalents and KString. This mostly meant that you couldn't use a StringView for finding a value in Vector<String>. I'm changing the order of arguments to make the trait type itself first (`Traits<T>::equals(T, U)`), as I think it's more expected and makes us more consistent with the rest of the functions that put the stored type first (like StringUtils functions and binary_serach). I've also renamed the variable name "other" in find functions to "entry" to give more importance to the value. With this change, each of the following lines will now compile successfully: Vector<String>().contains_slow("WHF!"sv); HashTable<String>().contains("WHF!"sv); HashMap<ByteBuffer, int>().contains("WHF!"sv.bytes());	2023-08-23 20:21:09 +02:00
Ben Wiederhake	36ff6187f6	Everywhere: Change spelling of 'behaviour' to 'behavior' "The official project language is American English […]." `5d2e915623/CONTRIBUTING.md`?plain=1#L30 Here's a short statistic of the occurrences of the word "behavio(u)r": $ git grep -IPioh 'behaviou?r' \| sort \| uniq -c \| sort -n 2 BEHAVIOR 24 Behaviour 32 behaviour 407 Behavior 992 behavior Therefore, it is clear that "behaviour" (56 occurrences) should be regarded a typo, and "behavior" (1401 occurrences) should be preferred. Note that The occurrences in LibJS are intentionally NOT changed, because there are taken verbatim from the specification. Hence: $ git grep -IPioh 'behaviou?r' \| sort \| uniq -c \| sort -n 2 BEHAVIOR 10 behaviour 24 Behaviour 407 Behavior 1014 behavior	2023-05-07 01:05:09 +02:00
Ben Wiederhake	ee47c0275e	Everywhere: Run spellcheck on all documentation	2023-05-07 01:05:09 +02:00
Aliaksandr Kalenik	4c6564e3c1	AK: Add values() method in HashTable Add HashTable::values() method that returns all values.	2023-04-28 18:11:44 +02:00
Jelle Raaijmakers	954d660094	AK: Clear OrderedHashTable previous/next pointers on removal With Clang, the previous/next pointers in buckets of an `OrderedHashTable` are not cleared when a bucket is being shifted up as a result of a removed bucket. As a result, an unfortunate pointer mixup could lead to an infinite loop in the `HashTable` iterator, which was exposed in `HashMap::keys()`. Co-authored-by: Luke Wilde <lukew@serenityos.org>	2023-03-15 21:43:52 +01:00
Hediadyoin1	fd8c54d720	AK: Add `take_first` to HashTable and rename `pop` to `take_last` This naming scheme matches Vector. This also changes `take_last` to move the value it takes, and delete by known pointer, avoiding a full lookup and potential copies.	2023-02-21 22:13:06 +01:00
Hediadyoin1	93945062a7	AK: Update HashTables head and tail when shifting during deletion Otherwise we end up with invalid pointers to them, breaking iteration.	2023-02-21 22:13:06 +01:00
Jelle Raaijmakers	c08d137fcd	AK: Reimplement `HashTable` with smart linear probing Instead of rehashing on collisions, we use Robin Hood hashing: a simple linear probe where we keep track of the distance between the bucket and its ideal position. On insertion, we allow a new bucket to "steal" the position of "rich" buckets (those near their ideal position) and move them further down. On removal, we shift buckets back up into the freed slot, decrementing their distance while doing so. This behavior automatically optimizes the number of required probes for any value, and removes the need for periodic rehashing (except when expanding the capacity).	2023-02-17 22:29:51 -07:00
Timothy Flynn	4f5353cbb8	AK: Rename double_hash to rehash_for_collision The name is currently quite confusing as it indicates it hashes doubles.	2023-01-21 10:36:14 +01:00
Eli Youngs	a2024cfb69	AK: Support popping an arbitrary element from a HashTable	2022-12-16 10:41:56 -07:00
Moustafa Raafat	b8f1e1bed2	Everywhere: Remove unnecessary AK and Detail namespace scoping	2022-12-09 11:25:30 +00:00
Linus Groh	d26aabff04	Everywhere: Run clang-format	2022-12-03 23:52:23 +00:00
Andreas Kling	ae3ffdd521	AK: Make it possible to not `using` AK classes into the global namespace This patch adds the `USING_AK_GLOBALLY` macro which is enabled by default, but can be overridden by build flags. This is a step towards integrating Jakt and AK types.	2022-11-26 15:51:34 +01:00
Zaggy1024	a1300d3797	AK: Don't crash in HashTable::clear_with_capacity on an empty table When calling clear_with_capacity on an empty HashTable/HashMap, a null deref would occur when trying to memset() m_buckets. Checking that it has capacity before clearing fixes the issue.	2022-11-11 00:44:04 -07:00
Hendiadyoin1	5bf84a5b0e	AK: Zero previous pointer after fixing the insertion list in HashTable	2022-06-23 20:25:12 +03:00
Idan Horowitz	eb02425ef9	AK: Clear the previous and next pointers of deleted HashTable buckets Usually the values of the previous and next pointers of deleted buckets are never used, as they're not part of the main ordered bucket chain, but if an in-place rehashing is done, which results in the bucket being turned into a free bucket, the stale pointers will remain, at which point any item that is inserted into said free-bucket will have either a stale previous pointer if the HashTable was empty on insertion, or a stale next pointer, resulting in undefined behaviour. This commit also includes a new HashMap test that reproduces this issue	2022-06-22 21:53:13 +02:00
Vitaly Dyachkov	a0a4d169f4	AK+LibGUI: Pass predicate to *_matching() methods by const reference	2022-05-08 17:02:00 +02:00
Idan Horowitz	086969277e	Everywhere: Run clang-format	2022-04-01 21:24:45 +01:00
kleines Filmröllchen	09a12247fb	AK: Use bucket states with special bit patterns in HashTable This simplifies some of the bucket state handling code, as there's now an easy way of checking the basic category of bucket state.	2022-03-31 12:06:13 +02:00
kleines Filmröllchen	49d29c8298	AK: Rehash HashTable in-place instead of shrinking As seen on TV, HashTable can get "thrashed", i.e. it has a bunch of deleted buckets that count towards the load factor. This means that hash tables which are large enough for their contents need to be resized. This was fixed in `9d8da16` with a workaround that shrinks the HashTable back down in these cases, as after the resize and re-hash the load factor is very low again. However, that's not a good solution. If you insert and remove repeatedly around a size boundary, you might get frequent resizes, which involve frequent re-allocations. The new solution is an in-place rehashing algorithm that I came up with. (Do complain to me, I'm at fault.) Basically, it iterates the buckets and re-hashes the used buckets while marking the deleted slots empty. The issue arises with collisions in the re-hash. For this reason, there are two kinds of used buckets during the re-hashing: the normal "used" buckets, which are old and are treated as free space, and the "re-hashed" buckets, which are new and treated as used space, i.e. they trigger probing. Therefore, the procedure for relocating a bucket's contents is as follows: - Locate the "real" bucket of the contents with the hash. That bucket is the starting point for the target bucket, and the current (old) bucket is the bucket we want to move. - While we still need to move the bucket: - If we're the target, something strange happened last iteration or we just re-hashed to the same location. We're done. - If the target is empty or deleted, just move the bucket. We're done. - If the target is a re-hashed full bucket, we probe by double-hashing our hash as usual. Henceforth, we move our target for the next iteration. - If the target is an old full bucket, we swap the target and to-move buckets. Therefore, the bucket to move is a the correct location and the former target, which still needs to find a new place, is now in the bucket to move. So we can just continue with the loop; the target is re-obtained from the bucket to move. This happens for each and every bucket, though some buckets are "coincidentally" moved before their point of iteration is reached. Either way, this guarantees full in-place movement (even without stack storage) and therefore space complexity of O(1). Time complexity is amortized O(2n) asssuming a good hashing function. This leads to a performance improvement of ~30% on the benchmark introduced with the last commit. Co-authored-by: Hendiadyoin1 <leon.a@serenityos.org>	2022-03-31 12:06:13 +02:00
kleines Filmröllchen	bcb8937898	AK: Merge HashTable bucket state into one enum The hash table buckets had three different state booleans that are in fact exclusive. In preparation for further states, this commit consolidates them into one enum. This has the added benefit on not relying on the compiler's boolean packing anymore; we definitely now only need one byte for the bucket state.	2022-03-31 12:06:13 +02:00
Daniel Bertalan	e3eb68dd58	AK+Kernel: Avoid double memory clearing of HashTable buckets Since the allocated memory is going to be zeroed immediately anyway, let's avoid redundantly scrubbing it with MALLOC_SCRUB_BYTE just before that. The latest versions of gcc and Clang can automatically do this malloc + memset -> calloc optimization, but I've seen a couple of places where it failed to be done. This commit also adds a naive kcalloc function to the kernel that doesn't (yet) eliminate the redundancy like the userland does.	2022-03-15 11:56:46 +01:00
Andreas Kling	9d8da1697e	AK: Automatically shrink HashTable when removing entries If the utilization of a HashTable (size vs capacity) goes below 20%, we'll now shrink the table down to capacity = (size * 2). This fixes an issue where tables would grow infinitely when inserting and removing keys repeatedly. Basically, we would accumulate deleted buckets with nothing reclaiming them, and eventually deciding that we needed to grow the table (because we grow if used+deleted > limit!) I found this because HashTable iteration was taking a suspicious amount of time in Core::EventLoop::get_next_timer_expiration(). Turns out the timer table kept growing in capacity over time. That made iteration slower and slower since HashTable iterators visit every bucket.	2022-03-07 00:08:22 +01:00
Andreas Kling	eb829924da	AK: Remove return value from HashTable::remove() and HashMap::remove() This was only used by remove_all_matching(), where it's no longer used.	2022-03-07 00:08:22 +01:00
Andreas Kling	623bdd8b6a	AK: Simplify HashTable::remove_all_matching() Just walk the table from start to finish, deleting buckets as we go. This removes the need for remove() to return an iterator, which is preventing me from implementing hash table auto-shrinking.	2022-03-07 00:08:22 +01:00
Idan Horowitz	9b0d90a71d	AK: Support using custom comparison operations for hash compatible keys	2022-01-29 23:01:23 +02:00
James Puleo	10b25d2a57	AK: Implement `HashTable::try_ensure_capacity`, as used in `HashMap` This was used in `HashMap::try_ensure_capacity`, but was missing from `HashTable`s implementation. No one had used `HashMap::try_ensure_capacity` before so it went unnoticed!	2022-01-25 09:17:22 +01:00
Andreas Kling	5279a04c78	AK: Make Hash{Map,Table}::remove_all_matching() return removal success These functions now return whether one or more entries were removed.	2022-01-05 18:57:14 +01:00
Andreas Kling	54cf42fac1	AK: Add HashTable::remove_all_matching(predicate) This removes all matching entries from a table in a single pass.	2022-01-05 18:57:14 +01:00
Hendiadyoin1	c673b7220a	AK: Enable fast path for removal by hash-compatible key in HashMap/Table	2021-12-15 23:35:14 -08:00
Hendiadyoin1	d50360f5dd	AK: Allow hash-compatible key types in Hash[Table\|Map] lookup This will allow us to avoid some potentially expensive type conversion during lookup, like form String to StringView, which would allocate memory otherwise.	2021-12-15 13:09:49 +03:30
Andrew Kaster	762b92c650	AK: Resolve clang-tidy readability-qualified-auto warnings ... In files included by Kernel/Process.cpp and Kernel/Thread.cpp	2021-11-14 22:52:35 +01:00
Andrew Kaster	22feb9d47b	AK: Resolve clang-tidy readability-bool-conversion warnings ... In files included by Kernel/Process.cpp and Kernel/Thread.cpp	2021-11-14 22:52:35 +01:00
Hendiadyoin1	f76241914c	AK: Allow to clear HashTables/Maps with capacity	2021-11-11 09:19:17 +01:00
Andreas Kling	9d1f238450	AK: Make HashTable and HashMap try_* functions return ErrorOr<T> This allows us to use TRY() and MUST() with them.	2021-11-11 01:27:46 +01:00
Ben Wiederhake	f8d7b4daea	AK: Add missing headers Example failure: IDAllocator.h only pulls in AK/Hashtable.h, so any compilation unit that includes AK/IDAllocator.h without including AK/Traits.h before it used to be doomed to fail with the cryptic error message "In instantiation of 'AK::HashTable<T, TraitsForT, IsOrdered>::Iterator AK::HashTable<T, TraitsForT, IsOrdered>::find(const T&) [with T = int; TraitsForT = AK::Traits: incomplete type 'AK::Traits<int>' used in nested name specifier".	2021-10-06 23:52:40 +01:00
Hendiadyoin1	93cf01ad7d	AK: Mark HashTable::size_in_bytes() as constexpr	2021-09-10 14:33:53 +00:00
Hediadyoin1	1aa527f5b6	AK: Add OOM safe interface to HashTable/Map This adds a new HashSetResult only returned by try_set, to signal allocation failure during setting.	2021-09-10 14:33:53 +00:00
Andreas Kling	6ad427993a	Everywhere: Behaviour => Behavior	2021-09-07 13:53:14 +02:00
Andreas Kling	a940a8bf37	AK: Remove unused private HashTable::lookup_for_reading()	2021-07-21 18:18:51 +02:00
Andreas Kling	f65b039c44	AK: Sprinkle [[nodiscard]] on HashMap and HashTable	2021-07-21 18:18:29 +02:00
ngc6302h	213e2af281	HashTable: Rename finders with a more accurate and self-descripting name	2021-07-13 17:31:00 +02:00
Andreas Kling	3aabace9f5	AK: Use kfree_sized() in AK::HashTable	2021-07-11 14:14:51 +02:00
Hediadyoin1	4a81c79909	AK: Add Ordering support to HashTable and HashMap Adds a IsOrdered flag to Hashtable and HashMap, which allows iteration in insertion order	2021-06-15 22:16:55 +02:00
Idan Horowitz	71c54198fa	AK: Allow changing the HashTable behaviour for sets on existing entries Specifically, replacing the existing entry or just keeping it and canceling the set.	2021-06-09 11:48:04 +01:00
Andreas Kling	c584421592	AK: Make HashTable::operator=(HashTable&&) clear the moved-from table This is consistent with how other AK containers behave when moved from.	2021-05-30 14:34:32 +02:00
Gunnar Beutner	f89e8fb71a	AK+LibC: Implement malloc_good_size() and use it for Vector/HashTable This implements the macOS API malloc_good_size() which returns the true allocation size for a given requested allocation size. This allows us to make use of all the available memory in a malloc chunk. For example, for a malloc request of 35 bytes our malloc would internally use a chunk of size 64, however the remaining 29 bytes would be unused. Knowing the true allocation size allows us to request more usable memory that would otherwise be wasted and make that available for Vector, HashTable and potentially other callers in the future.	2021-05-15 16:30:14 +02:00

1 2

98 commits