The "hash-ll.h" header was introduced via d1cbe1e6d8 (hash-ll.h: split
out of hash.h to remove dependency on repository.h, 2023-04-22) to make
explicit the split between hash-related functions that rely on the
global `the_repository`, and those that don't. This split is no longer
necessary now that we we have removed the reliance on `the_repository`.
Merge "hash-ll.h" back into "hash.h". This causes some code units to not
include "repository.h" anymore, which requires us to add some forward
declarations.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `empty_tree_oid_hex()` function use `the_repository` to derive the
hash function that shall be used. Require callers to pass in the hash
algorithm to get rid of this implicit dependency.
While at it, remove the unused `empty_blob_oid_hex()` function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both functions `is_empty_{blob,tree}_oid()` use `the_repository` to
derive the hash function that shall be used. Require callers to pass in
the hash algorithm to get rid of this implicit dependency.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `is_null_oid()` uses `oideq(oid, null_oid())` to check
whether a given object ID is the all-zero object ID. `null_oid()`
implicitly relies on `the_repository` though to return the correct null
object ID.
Get rid of this dependency by always comparing the complete hash array
for being all-zeroes. This is possible due to the refactoring of object
IDs so that their hash arrays are always fully initialized.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the preceding commit, the hash array of object IDs is now fully
zero-padded even when the hash algorithm's output is smaller than the
array length. With that, we can now adapt both `oidcmp()` and `oideq()`
to unconditionally memcmp(3P) the whole array instead of depending on
the hash size.
While it may feel inefficient to compare unused bytes for e.g. SHA-1, in
practice the compiler should now be able to produce code that is better
optimized both because we have no branch anymore, but also because the
size to compare is now known at compile time. Goldbolt spits out the
following assembly on an x86_64 platform with GCC 14.1 for the old and
new implementations of `oidcmp()`:
oidcmp_old:
movsx rax, DWORD PTR [rdi+32]
test eax, eax
jne .L2
mov rax, QWORD PTR the_repository[rip]
cmp QWORD PTR [rax+16], 32
je .L6
.L4:
mov edx, 20
jmp memcmp
.L2:
lea rdx, [rax+rax*2]
lea rax, [rax+rdx*4]
lea rax, hash_algos[0+rax*8]
cmp QWORD PTR [rax+16], 32
jne .L4
.L6:
mov edx, 32
jmp memcmp
oidcmp_new:
mov edx, 32
jmp memcmp
The new implementation gets ridi of all the branches and effectively
only ends setting up `edx` for `memcmp()` and then calling it.
And for `oideq()`:
oideq_old:
movsx rcx, DWORD PTR [rdi+32]
mov rax, rdi
mov rdx, rsi
test ecx, ecx
jne .L2
mov rcx, QWORD PTR the_repository[rip]
cmp QWORD PTR [rcx+16], 32
mov rcx, QWORD PTR [rax]
je .L12
.L4:
mov rsi, QWORD PTR [rax+8]
xor rcx, QWORD PTR [rdx]
xor rsi, QWORD PTR [rdx+8]
or rcx, rsi
je .L13
.L8:
mov eax, 1
test eax, eax
sete al
movzx eax, al
ret
.L2:
lea rsi, [rcx+rcx*2]
lea rcx, [rcx+rsi*4]
lea rcx, hash_algos[0+rcx*8]
cmp QWORD PTR [rcx+16], 32
mov rcx, QWORD PTR [rax]
jne .L4
.L12:
mov rsi, QWORD PTR [rax+8]
xor rcx, QWORD PTR [rdx]
xor rsi, QWORD PTR [rdx+8]
or rcx, rsi
jne .L8
mov rcx, QWORD PTR [rax+16]
mov rax, QWORD PTR [rax+24]
xor rcx, QWORD PTR [rdx+16]
xor rax, QWORD PTR [rdx+24]
or rcx, rax
jne .L8
xor eax, eax
.L14:
test eax, eax
sete al
movzx eax, al
ret
.L13:
mov edi, DWORD PTR [rdx+16]
cmp DWORD PTR [rax+16], edi
jne .L8
xor eax, eax
jmp .L14
oideq_new:
mov rax, QWORD PTR [rdi]
mov rdx, QWORD PTR [rdi+8]
xor rax, QWORD PTR [rsi]
xor rdx, QWORD PTR [rsi+8]
or rax, rdx
je .L5
.L2:
mov eax, 1
xor eax, 1
ret
.L5:
mov rax, QWORD PTR [rdi+16]
mov rdx, QWORD PTR [rdi+24]
xor rax, QWORD PTR [rsi+16]
xor rdx, QWORD PTR [rsi+24]
or rax, rdx
jne .L2
xor eax, eax
xor eax, 1
ret
Interestingly, the compiler decides to split the comparisons into two so
that it first compares the lower half of the object ID for equality and
then the upper half. If the first check shows a difference, then we
wouldn't even end up comparing the second half.
In both cases, the new generated code is significantly shorter and has
way less branches. While I didn't benchmark the change, I'd be surprised
if the new code was slower.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `oidcmp()` and `oideq()` functions only compare the prefix length as
specified by the given hash algorithm. This mandates that the object IDs
have a valid hash algorithm set, or otherwise we wouldn't be able to
figure out that prefix. As we do not have a hash algorithm in many
cases, for example when handling null object IDs, this assumption cannot
always be fulfilled. We thus have a fallback in place that instead uses
`the_repository` to derive the hash function. This implicit dependency
is hidden away from callers and can be quite surprising, especially in
contexts where there may be no repository.
In theory, we can adapt those functions to always memcmp(3P) the whole
length of their hash arrays. But there exist a couple of sites where we
populate `struct object_id`s such that only the prefix of its hash that
is actually used by the hash algorithm is populated. The remaining bytes
are left uninitialized. The fact that those bytes are uninitialized also
leads to warnings under Valgrind in some places where we copy those
bytes.
Refactor callsites where we populate object IDs to always initialize all
bytes. This also allows us to get rid of `oidcpy_with_padding()`, for
one because the input is now fully initialized, and because `oidcpy()`
will now always copy the whole hash array.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both `oidread()` and `oidclr()` use `the_repository` to derive the hash
function that shall be used. Require callers to pass in the hash
algorithm to get rid of this implicit dependency.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many of our hash functions have two variants, one receiving a `struct
git_hash_algo` and one that derives it via `the_repository`. Adapt all
of those functions to always require the hash algorithm as input and
drop the variants that do not accept one.
As those functions are now independent of `the_repository`, we can move
them from "hash.h" to "hash-ll.h".
Note that both in this and subsequent commits in this series we always
just pass `the_repository->hash_algo` as input even if it is obvious
that there is a repository in the context that we should be using the
hash from instead. This is done to be on the safe side and not introduce
any regressions. All callsites should eventually be amended to use a
repo passed via parameters, but this is outside the scope of this patch
series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Support short oids encoded in any algorithm, while ensuring enough of
the oid is specified to disambiguate between all of the oids in the
repository encoded in any algorithm.
By default have the code continue to only accept oids specified in the
storage hash algorithm of the repository, but when something is
ambiguous display all of the possible oids from any accepted oid
encoding.
A new flag is added GET_OID_HASH_ANY that when supplied causes the
code to accept oids specified in any hash algorithm, and to return the
oids that were resolved.
This implements the functionality that allows both SHA-1 and SHA-256
object names, from the "Object names on the command line" section of
the hash function transition document.
Care is taken in get_short_oid so that when the result is ambiguous
the output remains the same if GIT_OID_HASH_ANY was not supplied. If
GET_OID_HASH_ANY was supplied objects of any hash algorithm that match
the prefix are displayed.
This required updating repo_for_each_abbrev to give it a parameter so
that it knows to look at all hash algorithms.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adjust to OpenSSL 3+, which deprecates its SHA-1 functions based on
its traditional API, by using its EVP API instead.
* ew/hash-with-openssl-evp:
avoid SHA-1 functions deprecated in OpenSSL 3+
sha256: avoid functions deprecated in OpenSSL 3+
OpenSSL 3+ deprecates the SHA1_Init, SHA1_Update, and SHA1_Final
functions, leading to errors when building with `DEVELOPER=1'.
Use the newer EVP_* API with OpenSSL 3+ (only) despite being more
error-prone and less efficient due to heap allocations.
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
OpenSSL 3+ deprecates the SHA256_Init, SHA256_Update, and SHA256_Final
functions, leading to errors when building with `DEVELOPER=1'.
Use the newer EVP_* API with OpenSSL 3+ despite being more
error-prone and less efficient due to heap allocations.
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
oidhash() was used by both hashmap and khash, which makes sense.
However, the location of this function in hashmap.[ch] meant that
khash.h had to depend upon hashmap.h, making people unfamiliar with
khash think that it was built upon hashmap. (Or at least, I personally
was confused for a while about this in the past.)
Move this function to hash-ll, so that khash.h can stop depending upon
hashmap.h.
This has another benefit as well: it allows us to remove hashmap.h's
dependency on hash-ll.h. While some callers of hashmap.h were making
use of oidhash, most were not, so this change provides another way to
reduce the number of includes.
Diff best viewed with `--color-moved`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
hash.h depends upon and includes repository.h, due to the definition and
use of the_hash_algo (defined as the_repository->hash_algo). However,
most headers trying to include hash.h are only interested in the layout
of the structs like object_id. Move the parts of hash.h that do not
depend upon repository.h into a new file hash-ll.h (the "low level"
parts of hash.h), and adjust other files to use this new header where
the convenience inline functions aren't needed.
This allows hash.h and object.h to be fairly small, minimal headers. It
also exposes a lot of hidden dependencies on both path.h (which was
brought in by repository.h) and repository.h (which was previously
implicitly brought in by object.h), so also adjust other files to be
more explicit about what they depend upon.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>