Commit graph

17 commits

Author SHA1 Message Date
Ævar Arnfjörð Bjarmason 31deb28f5e fsck: don't hard die on invalid object types
Change the error fsck emits on invalid object types, such as:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    <OID>

From the very ungraceful error of:

    $ git fsck
    fatal: invalid object type
    $

To:

    $ git fsck
    error: <OID>: object is of unknown type 'garbage': <OID_PATH>
    [ other fsck output ]

We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).

To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
flag from read_loose_object() through to parse_loose_header(). Since
the read_loose_object() function is only used in builtin/fsck.c we can
simply change it to accept a "struct object_info" (which contains the
OBJECT_INFO_ALLOW_UNKNOWN_TYPE in its flags). See
f6371f9210 (sha1_file: add read_loose_object() function, 2017-01-13)
for the introduction of read_loose_object().

Since we'll need a "struct strbuf" to hold the "type_name" let's pass
it to the for_each_loose_file_in_objdir() callback to avoid allocating
a new one for each loose object in the iteration. It also makes the
memory management simpler than sticking it in fsck_loose() itself, as
we'll only need to strbuf_reset() it, with no need to do a
strbuf_release() before each "return".

Before this commit we'd never check the "type" if read_loose_object()
failed, but now we do. We therefore need to initialize it to OBJ_NONE
to be able to tell the difference between e.g. its
unpack_loose_header() having failed, and us getting past that and into
parse_loose_header().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:06:01 -07:00
Ævar Arnfjörð Bjarmason dccb32bf01 object-file.c: stop dying in parse_loose_header()
Make parse_loose_header() return error codes and data instead of
invoking die() by itself.

For now we'll move the relevant die() call to loose_object_info() and
read_loose_object() to keep this change smaller. In a subsequent
commit we'll make read_loose_object() return an error code instead of
dying. We should also address the "allow_unknown" case (should be
moved to builtin/cat-file.c), but for now I'll be leaving it.

For making parse_loose_header() not die() change its prototype to
accept a "struct object_info *" instead of the "unsigned long *sizep"
it accepted before. Its callers can now check the populated populated
"oi->typep".

Because of this we don't need to pass in the "unsigned int flags"
which we used for OBJECT_INFO_ALLOW_UNKNOWN_TYPE, we can instead do
that check in loose_object_info().

This also refactors some confusing control flow around the "status"
variable. In some cases we set it to the return value of "error()",
i.e. -1, and later checked if "status < 0" was true.

Since 93cff9a978 (sha1_loose_object_info: return error for corrupted
objects, 2017-04-01) the return value of loose_object_info() (then
named sha1_loose_object_info()) had been a "status" variable that be
any negative value, as we were expecting to return the "enum
object_type".

The only negative type happens to be OBJ_BAD, but the code still
assumed that more might be added. This was then used later in
e.g. c84a1f3ed4 (sha1_file: refactor read_object, 2017-06-21). Now
that parse_loose_header() will return 0 on success instead of the
type (which it'll stick into the "struct object_info") we don't need
to conflate these two cases in its callers.

Since parse_loose_header() doesn't need to return an arbitrary
"status" we only need to treat its "ret < 0" specially, but can
idiomatically overwrite it with our own error() return. This along
with having made unpack_loose_header() return an "enum
unpack_loose_header_result" in an earlier commit means that we can
move the previously nested if/else cases mostly into the "ULHR_OK"
branch of the "switch" statement.

We should be less silent if we reach that "status = -1" branch, which
happens if we've got trailing garbage in loose objects, see
f6371f9210 (sha1_file: add read_loose_object() function, 2017-01-13)
for a better way to handle it. For now let's punt on it, a subsequent
commit will address that edge case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason 5848fb11ac object-file.c: return ULHR_TOO_LONG on "header too long"
Split up the return code for "header too long" from the generic
negative return value unpack_loose_header() returns, and report via
error() if we exceed MAX_HEADER_LEN.

As a test added earlier in this series in t1006-cat-file.sh shows
we'll correctly emit zlib errors from zlib.c already in this case, so
we have no need to carry those return codes further down the
stack. Let's instead just return ULHR_TOO_LONG saying we ran into the
MAX_HEADER_LEN limit, or other negative values for "unable to unpack
<OID> header".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason 3b6a8db3b0 object-file.c: use "enum" return type for unpack_loose_header()
In a preceding commit we changed and documented unpack_loose_header()
from its previous behavior of returning any negative value or zero, to
only -1 or 0.

Let's add an "enum unpack_loose_header_result" type and use it for
these return values, and have the compiler assert that we're
exhaustively covering all of them.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason 01cab97679 object-file.c: simplify unpack_loose_short_header()
Combine the unpack_loose_short_header(),
unpack_loose_header_to_strbuf() and unpack_loose_header() functions
into one.

The unpack_loose_header_to_strbuf() function was added in
46f034483e (sha1_file: support reading from a loose object of unknown
type, 2015-05-03).

Its code was mostly copy/pasted between it and both of
unpack_loose_header() and unpack_loose_short_header(). We now have a
single unpack_loose_header() function which accepts an optional
"struct strbuf *" instead.

I think the remaining unpack_loose_header() function could be further
simplified, we're carrying some complexity just to be able to emit a
garbage type longer than MAX_HEADER_LEN, we could alternatively just
say "we found a garbage type <first 32 bytes>..." instead. But let's
leave the current behavior in place for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason ddb3474b66 object-file.c: make parse_loose_header_extended() public
Make the parse_loose_header_extended() function public and remove the
parse_loose_header() wrapper. The only direct user of it outside of
object-file.c itself was in streaming.c, that caller can simply pass
the required "struct object-info *" instead.

This change is being done in preparation for teaching
read_loose_object() to accept a flag to pass to
parse_loose_header(). It isn't strictly necessary for that change, we
could simply use parse_loose_header_extended() there, but will leave
the API in a better end state.

It would be a better end-state to have already moved the declaration
of these functions to object-store.h to avoid the forward declaration
of "struct object_info" in cache.h, but let's leave that cleanup for
some other time.

1. https://lore.kernel.org/git/patch-v6-09.22-5b9278e7bb4-20210907T104559Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason bfff2c4833 object-file.c: return -1, not "status" from unpack_loose_header()
Return a -1 when git_inflate() fails instead of whatever Z_* status
we'd get from zlib.c. This makes no difference to any error we report,
but makes it more obvious that we don't care about the specific zlib
error codes here.

See d21f842690 (unpack_sha1_header(): detect malformed object header,
2016-09-25) for the commit that added the "return status" code. As far
as I can tell there was never a real reason (e.g. different reporting)
for carrying down the "status" as opposed to "-1".

At the time that d21f842690 was written there was a corresponding
"ret < Z_OK" check right after the unpack_sha1_header() call (the
"unpack_sha1_header()" function was later rename to our current
"unpack_loose_header()").

However, that check was removed in c84a1f3ed4 (sha1_file: refactor
read_object, 2017-06-21) without changing the corresponding return
code.

So let's do the minor cleanup of also changing this function to return
a -1.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason 74ad250a1c object-file.c: don't set "typep" when returning non-zero
When the loose_object_info() function returns an error stop faking up
the "oi->typep" to OBJ_BAD. Let the return value of the function
itself suffice. This code cleanup simplifies subsequent changes.

That we set this at all is a relic from the past. Before
052fe5eaca (sha1_loose_object_info: make type lookup optional,
2013-07-12) we would always return the type_from_string(type) via the
parse_sha1_header() function, or -1 (i.e. OBJ_BAD) if we couldn't
parse it.

Then in a combination of 46f034483e (sha1_file: support reading from
a loose object of unknown type, 2015-05-03) and
b3ea7dd32d (sha1_loose_object_info: handle errors from
unpack_sha1_rest, 2017-10-05) our API drifted even further towards
conflating the two again.

Having read the code paths involved carefully I think this is OK. We
are just about to return -1, and we have only one caller:
do_oid_object_info_extended(). That function will in turn go on to
return -1 when we return -1 here.

This might be introducing a subtle bug where a caller of
oid_object_info_extended() would inspect its "typep" and expect a
meaningful value if the function returned -1.

Such a problem would not occur for its simpler oid_object_info()
sister function. That one always returns the "enum object_type", which
in the case of -1 would be the OBJ_BAD.

Having read the code for all the callers of these functions I don't
believe any such bug is being introduced here, and in any case we'd
likely already have such a bug for the "sizep" member (although
blindly checking "typep" first would be a more common case).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:06:00 -07:00
Junio C Hamano 33be431c0c Merge branch 'en/dir-traversal'
"git clean" and "git ls-files -i" had confusion around working on
or showing ignored paths inside an ignored directory, which has
been corrected.

* en/dir-traversal:
  dir: introduce readdir_skip_dot_and_dotdot() helper
  dir: update stale description of treat_directory()
  dir: traverse into untracked directories if they may have ignored subfiles
  dir: avoid unnecessary traversal into ignored directory
  t3001, t7300: add testcase showcasing missed directory traversal
  t7300: add testcase showing unnecessary traversal into ignored directory
  ls-files: error out on -i unless -o or -c are specified
  dir: report number of visited directories and paths with trace2
  dir: convert trace calls to trace2 equivalents
2021-05-20 08:54:59 +09:00
Elijah Newren b548f0f156 dir: introduce readdir_skip_dot_and_dotdot() helper
Many places in the code were doing
    while ((d = readdir(dir)) != NULL) {
        if (is_dot_or_dotdot(d->d_name))
            continue;
        ...process d...
    }
Introduce a readdir_skip_dot_and_dotdot() helper to make that a one-liner:
    while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) {
        ...process d...
    }

This helper particularly simplifies checks for empty directories.

Also use this helper in read_cached_dir() so that our statistics are
consistent across platforms.  (In other words, read_cached_dir() should
have been using is_dot_or_dotdot() and skipping such entries, but did
not and left it to treat_path() to detect and mark such entries as
path_none.)

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-13 08:45:03 +09:00
brian m. carlson 14228447c9 hash: provide per-algorithm null OIDs
Up until recently, object IDs did not have an algorithm member, only a
hash.  Consequently, it was possible to share one null (all-zeros)
object ID among all hash algorithms.  Now that we're going to be
handling objects from multiple hash algorithms, it's important to make
sure that all object IDs have a correct algorithm field.

Introduce a per-algorithm null OID, and add it to struct hash_algo.
Introduce a wrapper function as well, and use it everywhere we used to
use the null_oid constant.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-27 16:31:39 +09:00
brian m. carlson 5a6dce70d7 hash: set, copy, and use algo field in struct object_id
Now that struct object_id has an algorithm field, we should populate it.
This will allow us to handle object IDs in any supported algorithm and
distinguish between them.  Ensure that the field is written whenever we
write an object ID by storing it explicitly every time we write an
object.  Set values for the empty blob and tree values as well.

In addition, use the algorithm field to compare object IDs.  Note that
because we zero-initialize struct object_id in many places throughout
the codebase, we default to the default algorithm in cases where the
algorithm field is zero rather than explicitly initialize all of those
locations.

This leads to a branch on every comparison, but the alternative is to
compare the entire buffer each time and padding the buffer for SHA-1.
That alternative ranges up to 3.9% worse than this approach on the perf
t0001, t1450, and t1451.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-27 16:31:38 +09:00
brian m. carlson 5951bf467e Use the final_oid_fn to finalize hashing of object IDs
When we're hashing a value which is going to be an object ID, we want to
zero-pad that value if necessary.  To do so, use the final_oid_fn
instead of the final_fn anytime we're going to create an object ID to
ensure we perform this operation.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-27 16:31:38 +09:00
brian m. carlson ab795f0d77 hash: add a function to finalize object IDs
To avoid the penalty of having to branch in hash comparison functions,
we'll want to always compare the full hash member in a struct object_id,
which will require that SHA-1 object IDs be zero-padded.  To do so, add
a function which finalizes a hash context and writes it into an object
ID that performs this padding.

Move the definition of struct object_id and the constant definitions
higher up so we they are available for us to use.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-27 16:31:38 +09:00
René Scharfe ca56dadb4b use CALLOC_ARRAY
Add and apply a semantic patch for converting code that open-codes
CALLOC_ARRAY to use it instead.  It shortens the code and infers the
element size automatically.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-13 16:00:09 -08:00
Martin Ågren bc62692757 hash-lookup: rename from sha1-lookup
Change all remnants of "sha1" in hash-lookup.c and .h and rename them to
reflect that we're not just able to handle SHA-1 these days.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04 13:01:55 -08:00
Martin Ågren e5afd4449d object-file.c: rename from sha1-file.c
Drop the last remnant of "sha1" in this file and rename it to reflect
that we're not just able to handle SHA-1 these days.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04 13:01:55 -08:00
Renamed from sha1-file.c (Browse further)