2009-01-30 08:33:00 +00:00
|
|
|
#!/bin/sh
|
|
|
|
|
2010-09-07 01:47:07 +00:00
|
|
|
test_description='git fsck random collection of tests
|
|
|
|
|
|
|
|
* (HEAD) B
|
2020-11-18 23:44:21 +00:00
|
|
|
* (main) A
|
2010-09-07 01:47:07 +00:00
|
|
|
'
|
2009-01-30 08:33:00 +00:00
|
|
|
|
|
|
|
. ./test-lib.sh
|
|
|
|
|
|
|
|
test_expect_success setup '
|
2010-09-07 01:47:07 +00:00
|
|
|
git config gc.auto 0 &&
|
2010-05-26 21:50:34 +00:00
|
|
|
git config i18n.commitencoding ISO-8859-1 &&
|
2009-01-30 08:33:00 +00:00
|
|
|
test_commit A fileA one &&
|
2010-05-26 21:50:34 +00:00
|
|
|
git config --unset i18n.commitencoding &&
|
2009-01-30 08:33:00 +00:00
|
|
|
git checkout HEAD^0 &&
|
|
|
|
test_commit B fileB two &&
|
|
|
|
git tag -d A B &&
|
tests: use 'test_must_be_empty' instead of 'test_cmp <empty> <out>'
Using 'test_must_be_empty' is shorter and more idiomatic than
>empty &&
test_cmp empty out
as it saves the creation of an empty file. Furthermore, sometimes the
expected empty file doesn't have such a descriptive name like 'empty',
and its creation is far away from the place where it's finally used
for comparison (e.g. in 't7600-merge.sh', where two expected empty
files are created in the 'setup' test, but are used only about 500
lines later).
These cases were found by instrumenting 'test_cmp' to error out the
test script when it's used to compare empty files, and then converted
manually.
Note that even after this patch there still remain a lot of cases
where we use 'test_cmp' to check empty files:
- Sometimes the expected output is not hard-coded in the test, but
'test_cmp' is used to ensure that two similar git commands produce
the same output, and that output happens to be empty, e.g. the
test 'submodule update --merge - ignores --merge for new
submodules' in 't7406-submodule-update.sh'.
- Repetitive common tasks, including preparing the expected results
and running 'test_cmp', are often extracted into a helper
function, and some of this helper's callsites expect no output.
- For the same reason as above, the whole 'test_expect_success'
block is within a helper function, e.g. in 't3070-wildmatch.sh'.
- Or 'test_cmp' is invoked in a loop, e.g. the test 'cvs update
(-p)' in 't9400-git-cvsserver-server.sh'.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-19 21:57:25 +00:00
|
|
|
git reflog expire --expire=now --all
|
2009-01-30 08:33:00 +00:00
|
|
|
'
|
|
|
|
|
2009-01-30 08:50:54 +00:00
|
|
|
test_expect_success 'loose objects borrowed from alternate are not missing' '
|
|
|
|
mkdir another &&
|
|
|
|
(
|
|
|
|
cd another &&
|
|
|
|
git init &&
|
|
|
|
echo ../../../.git/objects >.git/objects/info/alternates &&
|
|
|
|
test_commit C fileC one &&
|
2012-02-28 22:55:39 +00:00
|
|
|
git fsck --no-dangling >../actual 2>&1
|
2010-09-07 01:47:07 +00:00
|
|
|
) &&
|
tests: use 'test_must_be_empty' instead of 'test_cmp <empty> <out>'
Using 'test_must_be_empty' is shorter and more idiomatic than
>empty &&
test_cmp empty out
as it saves the creation of an empty file. Furthermore, sometimes the
expected empty file doesn't have such a descriptive name like 'empty',
and its creation is far away from the place where it's finally used
for comparison (e.g. in 't7600-merge.sh', where two expected empty
files are created in the 'setup' test, but are used only about 500
lines later).
These cases were found by instrumenting 'test_cmp' to error out the
test script when it's used to compare empty files, and then converted
manually.
Note that even after this patch there still remain a lot of cases
where we use 'test_cmp' to check empty files:
- Sometimes the expected output is not hard-coded in the test, but
'test_cmp' is used to ensure that two similar git commands produce
the same output, and that output happens to be empty, e.g. the
test 'submodule update --merge - ignores --merge for new
submodules' in 't7406-submodule-update.sh'.
- Repetitive common tasks, including preparing the expected results
and running 'test_cmp', are often extracted into a helper
function, and some of this helper's callsites expect no output.
- For the same reason as above, the whole 'test_expect_success'
block is within a helper function, e.g. in 't3070-wildmatch.sh'.
- Or 'test_cmp' is invoked in a loop, e.g. the test 'cvs update
(-p)' in 't9400-git-cvsserver-server.sh'.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-19 21:57:25 +00:00
|
|
|
test_must_be_empty actual
|
2009-01-30 08:50:54 +00:00
|
|
|
'
|
|
|
|
|
2010-09-07 01:47:07 +00:00
|
|
|
test_expect_success 'HEAD is part of refs, valid objects appear valid' '
|
|
|
|
git fsck >actual 2>&1 &&
|
tests: use 'test_must_be_empty' instead of 'test_cmp <empty> <out>'
Using 'test_must_be_empty' is shorter and more idiomatic than
>empty &&
test_cmp empty out
as it saves the creation of an empty file. Furthermore, sometimes the
expected empty file doesn't have such a descriptive name like 'empty',
and its creation is far away from the place where it's finally used
for comparison (e.g. in 't7600-merge.sh', where two expected empty
files are created in the 'setup' test, but are used only about 500
lines later).
These cases were found by instrumenting 'test_cmp' to error out the
test script when it's used to compare empty files, and then converted
manually.
Note that even after this patch there still remain a lot of cases
where we use 'test_cmp' to check empty files:
- Sometimes the expected output is not hard-coded in the test, but
'test_cmp' is used to ensure that two similar git commands produce
the same output, and that output happens to be empty, e.g. the
test 'submodule update --merge - ignores --merge for new
submodules' in 't7406-submodule-update.sh'.
- Repetitive common tasks, including preparing the expected results
and running 'test_cmp', are often extracted into a helper
function, and some of this helper's callsites expect no output.
- For the same reason as above, the whole 'test_expect_success'
block is within a helper function, e.g. in 't3070-wildmatch.sh'.
- Or 'test_cmp' is invoked in a loop, e.g. the test 'cvs update
(-p)' in 't9400-git-cvsserver-server.sh'.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-19 21:57:25 +00:00
|
|
|
test_must_be_empty actual
|
2010-05-26 21:50:34 +00:00
|
|
|
'
|
|
|
|
|
2009-02-19 11:13:39 +00:00
|
|
|
# Corruption tests follow. Make sure to remove all traces of the
|
|
|
|
# specific corruption you test afterwards, lest a later test trip over
|
|
|
|
# it.
|
|
|
|
|
2021-02-10 18:01:29 +00:00
|
|
|
sha1_file () {
|
|
|
|
git rev-parse --git-path objects/$(test_oid_to_path "$1")
|
|
|
|
}
|
2010-09-07 01:47:07 +00:00
|
|
|
|
2021-02-10 18:01:29 +00:00
|
|
|
remove_object () {
|
|
|
|
rm "$(sha1_file "$1")"
|
|
|
|
}
|
2010-09-07 01:47:07 +00:00
|
|
|
|
2021-10-01 09:16:38 +00:00
|
|
|
test_expect_success 'object with hash mismatch' '
|
|
|
|
git init --bare hash-mismatch &&
|
|
|
|
(
|
|
|
|
cd hash-mismatch &&
|
2010-09-07 01:47:07 +00:00
|
|
|
|
2021-10-01 09:16:38 +00:00
|
|
|
oid=$(echo blob | git hash-object -w --stdin) &&
|
fsck: report invalid object type-path combinations
Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.
Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.
Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:
$ git hash-object --stdin -w -t blob </dev/null
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ mv objects/e6/ objects/e7
Would emit ("[...]" used to abbreviate the OIDs):
git fsck
error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]
Now we'll instead emit:
error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]
Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:
$ git hash-object --stdin -w -t garbage --literally </dev/null
8315a83d2acc4c174aed59430f9a9c4ed926440f
$ mv objects/83 objects/84
As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:
$ git fsck
fatal: invalid object type
Now we'll instead emit sensible error messages:
$ git fsck
error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]
In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.
We need to add the "object corrupt or missing" special-case to deal
with cases where read_loose_object() will return an error before
completing check_object_signature(), e.g. if we have an error in
unpack_loose_rest() because we find garbage after the valid gzip
content:
$ git hash-object --stdin -w -t blob </dev/null
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ git fsck
error: garbage at end of loose object 'e69d[...]'
error: unable to unpack contents of ./objects/e6/9d[...]
error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...]
There is currently some weird messaging in the edge case when the two
are combined, i.e. because we're not explicitly passing along an error
state about this specific scenario from check_stream_oid() via
read_loose_object() we'll end up printing the null OID if an object is
of an unknown type *and* it can't be unpacked by zlib, e.g.:
$ git hash-object --stdin -w -t garbage --literally </dev/null
8315a83d2acc4c174aed59430f9a9c4ed926440f
$ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
$ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
$ /usr/bin/git fsck
fatal: invalid object type
$ ~/g/git/git fsck
error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f'
error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
[...]
I think it's OK to leave that for future improvements, which would
involve enum-ifying more error state as we've done with "enum
unpack_loose_header_result" in preceding commits. In these
increasingly more obscure cases the worst that can happen is that
we'll get slightly nonsensical or inapplicable error messages.
There's other such potential edge cases, all of which might produce
some confusing messaging, but still be handled correctly as far as
passing along errors goes. E.g. if check_object_signature() returns
and oideq(real_oid, null_oid()) is true, which could happen if it
returns -1 due to the read_istream() call having failed.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 09:16:53 +00:00
|
|
|
oldoid=$oid &&
|
2021-10-01 09:16:38 +00:00
|
|
|
old=$(test_oid_to_path "$oid") &&
|
|
|
|
new=$(dirname $old)/$(test_oid ff_2) &&
|
|
|
|
oid="$(dirname $new)$(basename $new)" &&
|
|
|
|
|
|
|
|
mv objects/$old objects/$new &&
|
|
|
|
git update-index --add --cacheinfo 100644 $oid foo &&
|
|
|
|
tree=$(git write-tree) &&
|
|
|
|
cmt=$(echo bogus | git commit-tree $tree) &&
|
|
|
|
git update-ref refs/heads/bogus $cmt &&
|
|
|
|
|
|
|
|
test_must_fail git fsck 2>out &&
|
fsck: report invalid object type-path combinations
Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.
Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.
Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:
$ git hash-object --stdin -w -t blob </dev/null
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ mv objects/e6/ objects/e7
Would emit ("[...]" used to abbreviate the OIDs):
git fsck
error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]
Now we'll instead emit:
error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]
Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:
$ git hash-object --stdin -w -t garbage --literally </dev/null
8315a83d2acc4c174aed59430f9a9c4ed926440f
$ mv objects/83 objects/84
As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:
$ git fsck
fatal: invalid object type
Now we'll instead emit sensible error messages:
$ git fsck
error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]
In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.
We need to add the "object corrupt or missing" special-case to deal
with cases where read_loose_object() will return an error before
completing check_object_signature(), e.g. if we have an error in
unpack_loose_rest() because we find garbage after the valid gzip
content:
$ git hash-object --stdin -w -t blob </dev/null
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ git fsck
error: garbage at end of loose object 'e69d[...]'
error: unable to unpack contents of ./objects/e6/9d[...]
error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...]
There is currently some weird messaging in the edge case when the two
are combined, i.e. because we're not explicitly passing along an error
state about this specific scenario from check_stream_oid() via
read_loose_object() we'll end up printing the null OID if an object is
of an unknown type *and* it can't be unpacked by zlib, e.g.:
$ git hash-object --stdin -w -t garbage --literally </dev/null
8315a83d2acc4c174aed59430f9a9c4ed926440f
$ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
$ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
$ /usr/bin/git fsck
fatal: invalid object type
$ ~/g/git/git fsck
error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f'
error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
[...]
I think it's OK to leave that for future improvements, which would
involve enum-ifying more error state as we've done with "enum
unpack_loose_header_result" in preceding commits. In these
increasingly more obscure cases the worst that can happen is that
we'll get slightly nonsensical or inapplicable error messages.
There's other such potential edge cases, all of which might produce
some confusing messaging, but still be handled correctly as far as
passing along errors goes. E.g. if check_object_signature() returns
and oideq(real_oid, null_oid()) is true, which could happen if it
returns -1 due to the read_istream() call having failed.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 09:16:53 +00:00
|
|
|
grep "$oldoid: hash-path mismatch, found at: .*$new" out
|
2021-10-01 09:16:38 +00:00
|
|
|
)
|
2009-02-19 11:13:39 +00:00
|
|
|
'
|
|
|
|
|
2021-10-01 09:16:39 +00:00
|
|
|
test_expect_success 'object with hash and type mismatch' '
|
|
|
|
git init --bare hash-type-mismatch &&
|
|
|
|
(
|
|
|
|
cd hash-type-mismatch &&
|
|
|
|
|
|
|
|
oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
|
fsck: report invalid object type-path combinations
Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.
Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.
Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:
$ git hash-object --stdin -w -t blob </dev/null
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ mv objects/e6/ objects/e7
Would emit ("[...]" used to abbreviate the OIDs):
git fsck
error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]
Now we'll instead emit:
error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]
Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:
$ git hash-object --stdin -w -t garbage --literally </dev/null
8315a83d2acc4c174aed59430f9a9c4ed926440f
$ mv objects/83 objects/84
As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:
$ git fsck
fatal: invalid object type
Now we'll instead emit sensible error messages:
$ git fsck
error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]
In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.
We need to add the "object corrupt or missing" special-case to deal
with cases where read_loose_object() will return an error before
completing check_object_signature(), e.g. if we have an error in
unpack_loose_rest() because we find garbage after the valid gzip
content:
$ git hash-object --stdin -w -t blob </dev/null
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ git fsck
error: garbage at end of loose object 'e69d[...]'
error: unable to unpack contents of ./objects/e6/9d[...]
error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...]
There is currently some weird messaging in the edge case when the two
are combined, i.e. because we're not explicitly passing along an error
state about this specific scenario from check_stream_oid() via
read_loose_object() we'll end up printing the null OID if an object is
of an unknown type *and* it can't be unpacked by zlib, e.g.:
$ git hash-object --stdin -w -t garbage --literally </dev/null
8315a83d2acc4c174aed59430f9a9c4ed926440f
$ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
$ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
$ /usr/bin/git fsck
fatal: invalid object type
$ ~/g/git/git fsck
error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f'
error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
[...]
I think it's OK to leave that for future improvements, which would
involve enum-ifying more error state as we've done with "enum
unpack_loose_header_result" in preceding commits. In these
increasingly more obscure cases the worst that can happen is that
we'll get slightly nonsensical or inapplicable error messages.
There's other such potential edge cases, all of which might produce
some confusing messaging, but still be handled correctly as far as
passing along errors goes. E.g. if check_object_signature() returns
and oideq(real_oid, null_oid()) is true, which could happen if it
returns -1 due to the read_istream() call having failed.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 09:16:53 +00:00
|
|
|
oldoid=$oid &&
|
2021-10-01 09:16:39 +00:00
|
|
|
old=$(test_oid_to_path "$oid") &&
|
|
|
|
new=$(dirname $old)/$(test_oid ff_2) &&
|
|
|
|
oid="$(dirname $new)$(basename $new)" &&
|
|
|
|
|
|
|
|
mv objects/$old objects/$new &&
|
|
|
|
git update-index --add --cacheinfo 100644 $oid foo &&
|
|
|
|
tree=$(git write-tree) &&
|
|
|
|
cmt=$(echo bogus | git commit-tree $tree) &&
|
|
|
|
git update-ref refs/heads/bogus $cmt &&
|
|
|
|
|
2021-10-01 09:16:52 +00:00
|
|
|
|
|
|
|
test_must_fail git fsck 2>out &&
|
fsck: report invalid object type-path combinations
Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.
Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.
Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:
$ git hash-object --stdin -w -t blob </dev/null
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ mv objects/e6/ objects/e7
Would emit ("[...]" used to abbreviate the OIDs):
git fsck
error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]
Now we'll instead emit:
error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]
Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:
$ git hash-object --stdin -w -t garbage --literally </dev/null
8315a83d2acc4c174aed59430f9a9c4ed926440f
$ mv objects/83 objects/84
As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:
$ git fsck
fatal: invalid object type
Now we'll instead emit sensible error messages:
$ git fsck
error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]
In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.
We need to add the "object corrupt or missing" special-case to deal
with cases where read_loose_object() will return an error before
completing check_object_signature(), e.g. if we have an error in
unpack_loose_rest() because we find garbage after the valid gzip
content:
$ git hash-object --stdin -w -t blob </dev/null
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ git fsck
error: garbage at end of loose object 'e69d[...]'
error: unable to unpack contents of ./objects/e6/9d[...]
error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...]
There is currently some weird messaging in the edge case when the two
are combined, i.e. because we're not explicitly passing along an error
state about this specific scenario from check_stream_oid() via
read_loose_object() we'll end up printing the null OID if an object is
of an unknown type *and* it can't be unpacked by zlib, e.g.:
$ git hash-object --stdin -w -t garbage --literally </dev/null
8315a83d2acc4c174aed59430f9a9c4ed926440f
$ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
$ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
$ /usr/bin/git fsck
fatal: invalid object type
$ ~/g/git/git fsck
error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f'
error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
[...]
I think it's OK to leave that for future improvements, which would
involve enum-ifying more error state as we've done with "enum
unpack_loose_header_result" in preceding commits. In these
increasingly more obscure cases the worst that can happen is that
we'll get slightly nonsensical or inapplicable error messages.
There's other such potential edge cases, all of which might produce
some confusing messaging, but still be handled correctly as far as
passing along errors goes. E.g. if check_object_signature() returns
and oideq(real_oid, null_oid()) is true, which could happen if it
returns -1 due to the read_istream() call having failed.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 09:16:53 +00:00
|
|
|
grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
|
|
|
|
grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
|
2021-10-01 09:16:39 +00:00
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2022-01-13 20:28:45 +00:00
|
|
|
test_expect_success 'zlib corrupt loose object output ' '
|
2021-10-01 09:16:40 +00:00
|
|
|
git init --bare corrupt-loose-output &&
|
|
|
|
(
|
|
|
|
cd corrupt-loose-output &&
|
|
|
|
oid=$(git hash-object -w --stdin --literally </dev/null) &&
|
|
|
|
oidf=objects/$(test_oid_to_path "$oid") &&
|
2022-01-13 20:28:45 +00:00
|
|
|
chmod +w $oidf &&
|
2021-10-01 09:16:40 +00:00
|
|
|
echo extra garbage >>$oidf &&
|
|
|
|
|
|
|
|
cat >expect.error <<-EOF &&
|
|
|
|
error: garbage at end of loose object '\''$oid'\''
|
|
|
|
error: unable to unpack contents of ./$oidf
|
|
|
|
error: $oid: object corrupt or missing: ./$oidf
|
|
|
|
EOF
|
|
|
|
test_must_fail git fsck 2>actual &&
|
|
|
|
grep ^error: actual >error &&
|
|
|
|
test_cmp expect.error error
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2009-02-19 11:13:39 +00:00
|
|
|
test_expect_success 'branch pointing to non-commit' '
|
2010-09-07 01:47:07 +00:00
|
|
|
git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
|
|
|
|
test_when_finished "git update-ref -d refs/heads/invalid" &&
|
2015-09-23 20:46:39 +00:00
|
|
|
test_must_fail git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "not a commit" out
|
2009-02-19 11:13:39 +00:00
|
|
|
'
|
|
|
|
|
2015-09-23 20:46:39 +00:00
|
|
|
test_expect_success 'HEAD link pointing at a funny object' '
|
|
|
|
test_when_finished "mv .git/SAVED_HEAD .git/HEAD" &&
|
|
|
|
mv .git/HEAD .git/SAVED_HEAD &&
|
2019-06-28 22:59:21 +00:00
|
|
|
echo $ZERO_OID >.git/HEAD &&
|
2015-09-23 20:46:39 +00:00
|
|
|
# avoid corrupt/broken HEAD from interfering with repo discovery
|
|
|
|
test_must_fail env GIT_DIR=.git git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "detached HEAD points" out
|
2015-09-23 20:46:39 +00:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'HEAD link pointing at a funny place' '
|
|
|
|
test_when_finished "mv .git/SAVED_HEAD .git/HEAD" &&
|
|
|
|
mv .git/HEAD .git/SAVED_HEAD &&
|
|
|
|
echo "ref: refs/funny/place" >.git/HEAD &&
|
|
|
|
# avoid corrupt/broken HEAD from interfering with repo discovery
|
|
|
|
test_must_fail env GIT_DIR=.git git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "HEAD points to something strange" out
|
2015-09-23 20:46:39 +00:00
|
|
|
'
|
|
|
|
|
2018-10-21 08:08:58 +00:00
|
|
|
test_expect_success 'HEAD link pointing at a funny object (from different wt)' '
|
|
|
|
test_when_finished "mv .git/SAVED_HEAD .git/HEAD" &&
|
|
|
|
test_when_finished "rm -rf .git/worktrees wt" &&
|
|
|
|
git worktree add wt &&
|
|
|
|
mv .git/HEAD .git/SAVED_HEAD &&
|
|
|
|
echo $ZERO_OID >.git/HEAD &&
|
|
|
|
# avoid corrupt/broken HEAD from interfering with repo discovery
|
|
|
|
test_must_fail git -C wt fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "main-worktree/HEAD: detached HEAD points" out
|
2018-10-21 08:08:58 +00:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'other worktree HEAD link pointing at a funny object' '
|
|
|
|
test_when_finished "rm -rf .git/worktrees other" &&
|
|
|
|
git worktree add other &&
|
|
|
|
echo $ZERO_OID >.git/worktrees/other/HEAD &&
|
|
|
|
test_must_fail git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "worktrees/other/HEAD: detached HEAD points" out
|
2018-10-21 08:08:58 +00:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'other worktree HEAD link pointing at missing object' '
|
|
|
|
test_when_finished "rm -rf .git/worktrees other" &&
|
|
|
|
git worktree add other &&
|
|
|
|
echo "Contents missing from repo" | git hash-object --stdin >.git/worktrees/other/HEAD &&
|
|
|
|
test_must_fail git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "worktrees/other/HEAD: invalid sha1 pointer" out
|
2018-10-21 08:08:58 +00:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'other worktree HEAD link pointing at a funny place' '
|
|
|
|
test_when_finished "rm -rf .git/worktrees other" &&
|
|
|
|
git worktree add other &&
|
|
|
|
echo "ref: refs/funny/place" >.git/worktrees/other/HEAD &&
|
|
|
|
test_must_fail git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "worktrees/other/HEAD points to something strange" out
|
2018-10-21 08:08:58 +00:00
|
|
|
'
|
|
|
|
|
2020-02-22 20:17:42 +00:00
|
|
|
test_expect_success 'commit with multiple signatures is okay' '
|
|
|
|
git cat-file commit HEAD >basis &&
|
|
|
|
cat >sigs <<-EOF &&
|
|
|
|
gpgsig -----BEGIN PGP SIGNATURE-----
|
|
|
|
VGhpcyBpcyBub3QgcmVhbGx5IGEgc2lnbmF0dXJlLg==
|
|
|
|
-----END PGP SIGNATURE-----
|
|
|
|
gpgsig-sha256 -----BEGIN PGP SIGNATURE-----
|
|
|
|
VGhpcyBpcyBub3QgcmVhbGx5IGEgc2lnbmF0dXJlLg==
|
|
|
|
-----END PGP SIGNATURE-----
|
|
|
|
EOF
|
|
|
|
sed -e "/^committer/q" basis >okay &&
|
|
|
|
cat sigs >>okay &&
|
|
|
|
echo >>okay &&
|
|
|
|
sed -e "1,/^$/d" basis >>okay &&
|
|
|
|
cat okay &&
|
|
|
|
new=$(git hash-object -t commit -w --stdin <okay) &&
|
|
|
|
test_when_finished "remove_object $new" &&
|
|
|
|
git update-ref refs/heads/bogus "$new" &&
|
|
|
|
test_when_finished "git update-ref -d refs/heads/bogus" &&
|
|
|
|
git fsck 2>out &&
|
|
|
|
cat out &&
|
|
|
|
! grep "commit $new" out
|
|
|
|
'
|
|
|
|
|
2010-04-24 16:06:08 +00:00
|
|
|
test_expect_success 'email without @ is okay' '
|
|
|
|
git cat-file commit HEAD >basis &&
|
|
|
|
sed "s/@/AT/" basis >okay &&
|
|
|
|
new=$(git hash-object -t commit -w --stdin <okay) &&
|
2010-09-07 01:47:07 +00:00
|
|
|
test_when_finished "remove_object $new" &&
|
2010-04-24 16:06:08 +00:00
|
|
|
git update-ref refs/heads/bogus "$new" &&
|
2010-09-07 01:47:07 +00:00
|
|
|
test_when_finished "git update-ref -d refs/heads/bogus" &&
|
2010-04-24 16:06:08 +00:00
|
|
|
git fsck 2>out &&
|
2010-09-07 01:47:07 +00:00
|
|
|
! grep "commit $new" out
|
2010-04-24 16:06:08 +00:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'email with embedded > is not okay' '
|
|
|
|
git cat-file commit HEAD >basis &&
|
|
|
|
sed "s/@[a-z]/&>/" basis >bad-email &&
|
2023-01-18 20:41:56 +00:00
|
|
|
new=$(git hash-object --literally -t commit -w --stdin <bad-email) &&
|
2010-09-07 01:47:07 +00:00
|
|
|
test_when_finished "remove_object $new" &&
|
2010-04-24 16:06:08 +00:00
|
|
|
git update-ref refs/heads/bogus "$new" &&
|
2010-09-07 01:47:07 +00:00
|
|
|
test_when_finished "git update-ref -d refs/heads/bogus" &&
|
2014-08-29 20:31:46 +00:00
|
|
|
test_must_fail git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "error in commit $new" out
|
2010-04-24 16:06:08 +00:00
|
|
|
'
|
2009-02-19 11:13:39 +00:00
|
|
|
|
2011-08-11 10:21:10 +00:00
|
|
|
test_expect_success 'missing < email delimiter is reported nicely' '
|
2011-08-11 10:21:09 +00:00
|
|
|
git cat-file commit HEAD >basis &&
|
|
|
|
sed "s/<//" basis >bad-email-2 &&
|
2023-01-18 20:41:56 +00:00
|
|
|
new=$(git hash-object --literally -t commit -w --stdin <bad-email-2) &&
|
2011-08-11 10:21:09 +00:00
|
|
|
test_when_finished "remove_object $new" &&
|
|
|
|
git update-ref refs/heads/bogus "$new" &&
|
|
|
|
test_when_finished "git update-ref -d refs/heads/bogus" &&
|
2014-08-29 20:31:46 +00:00
|
|
|
test_must_fail git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "error in commit $new.* - bad name" out
|
2011-08-11 10:21:09 +00:00
|
|
|
'
|
|
|
|
|
2011-08-11 10:21:10 +00:00
|
|
|
test_expect_success 'missing email is reported nicely' '
|
2011-08-11 10:21:09 +00:00
|
|
|
git cat-file commit HEAD >basis &&
|
|
|
|
sed "s/[a-z]* <[^>]*>//" basis >bad-email-3 &&
|
2023-01-18 20:41:56 +00:00
|
|
|
new=$(git hash-object --literally -t commit -w --stdin <bad-email-3) &&
|
2011-08-11 10:21:09 +00:00
|
|
|
test_when_finished "remove_object $new" &&
|
|
|
|
git update-ref refs/heads/bogus "$new" &&
|
|
|
|
test_when_finished "git update-ref -d refs/heads/bogus" &&
|
2014-08-29 20:31:46 +00:00
|
|
|
test_must_fail git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "error in commit $new.* - missing email" out
|
2011-08-11 10:21:09 +00:00
|
|
|
'
|
|
|
|
|
2011-08-11 10:21:10 +00:00
|
|
|
test_expect_success '> in name is reported' '
|
2011-08-11 10:21:09 +00:00
|
|
|
git cat-file commit HEAD >basis &&
|
|
|
|
sed "s/ </> </" basis >bad-email-4 &&
|
2023-01-18 20:41:56 +00:00
|
|
|
new=$(git hash-object --literally -t commit -w --stdin <bad-email-4) &&
|
2011-08-11 10:21:09 +00:00
|
|
|
test_when_finished "remove_object $new" &&
|
|
|
|
git update-ref refs/heads/bogus "$new" &&
|
|
|
|
test_when_finished "git update-ref -d refs/heads/bogus" &&
|
2014-08-29 20:31:46 +00:00
|
|
|
test_must_fail git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "error in commit $new" out
|
2011-08-11 10:21:09 +00:00
|
|
|
'
|
|
|
|
|
fsck: report integer overflow in author timestamps
When we check commit objects, we complain if commit->date is
ULONG_MAX, which is an indication that we saw integer
overflow when parsing it. However, we do not do any check at
all for author lines, which also contain a timestamp.
Let's actually check the timestamps on each ident line
with strtoul. This catches both author and committer lines,
and we can get rid of the now-redundant commit->date check.
Note that like the existing check, we compare only against
ULONG_MAX. Now that we are calling strtoul at the site of
the check, we could be slightly more careful and also check
that errno is set to ERANGE. However, this will make further
refactoring in future patches a little harder, and it
doesn't really matter in practice.
For 32-bit systems, one would have to create a commit at the
exact wrong second in 2038. But by the time we get close to
that, all systems will hopefully have moved to 64-bit (and
if they haven't, they have a real problem one second later).
For 64-bit systems, by the time we get close to ULONG_MAX,
all systems will hopefully have been consumed in the fiery
wrath of our expanding Sun.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 07:39:04 +00:00
|
|
|
# date is 2^64 + 1
|
|
|
|
test_expect_success 'integer overflow in timestamps is reported' '
|
|
|
|
git cat-file commit HEAD >basis &&
|
|
|
|
sed "s/^\\(author .*>\\) [0-9]*/\\1 18446744073709551617/" \
|
|
|
|
<basis >bad-timestamp &&
|
2023-01-18 20:41:56 +00:00
|
|
|
new=$(git hash-object --literally -t commit -w --stdin <bad-timestamp) &&
|
fsck: report integer overflow in author timestamps
When we check commit objects, we complain if commit->date is
ULONG_MAX, which is an indication that we saw integer
overflow when parsing it. However, we do not do any check at
all for author lines, which also contain a timestamp.
Let's actually check the timestamps on each ident line
with strtoul. This catches both author and committer lines,
and we can get rid of the now-redundant commit->date check.
Note that like the existing check, we compare only against
ULONG_MAX. Now that we are calling strtoul at the site of
the check, we could be slightly more careful and also check
that errno is set to ERANGE. However, this will make further
refactoring in future patches a little harder, and it
doesn't really matter in practice.
For 32-bit systems, one would have to create a commit at the
exact wrong second in 2038. But by the time we get close to
that, all systems will hopefully have moved to 64-bit (and
if they haven't, they have a real problem one second later).
For 64-bit systems, by the time we get close to ULONG_MAX,
all systems will hopefully have been consumed in the fiery
wrath of our expanding Sun.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 07:39:04 +00:00
|
|
|
test_when_finished "remove_object $new" &&
|
|
|
|
git update-ref refs/heads/bogus "$new" &&
|
|
|
|
test_when_finished "git update-ref -d refs/heads/bogus" &&
|
2014-08-29 20:31:46 +00:00
|
|
|
test_must_fail git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "error in commit $new.*integer overflow" out
|
fsck: report integer overflow in author timestamps
When we check commit objects, we complain if commit->date is
ULONG_MAX, which is an indication that we saw integer
overflow when parsing it. However, we do not do any check at
all for author lines, which also contain a timestamp.
Let's actually check the timestamps on each ident line
with strtoul. This catches both author and committer lines,
and we can get rid of the now-redundant commit->date check.
Note that like the existing check, we compare only against
ULONG_MAX. Now that we are calling strtoul at the site of
the check, we could be slightly more careful and also check
that errno is set to ERANGE. However, this will make further
refactoring in future patches a little harder, and it
doesn't really matter in practice.
For 32-bit systems, one would have to create a commit at the
exact wrong second in 2038. But by the time we get close to
that, all systems will hopefully have moved to 64-bit (and
if they haven't, they have a real problem one second later).
For 64-bit systems, by the time we get close to ULONG_MAX,
all systems will hopefully have been consumed in the fiery
wrath of our expanding Sun.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 07:39:04 +00:00
|
|
|
'
|
|
|
|
|
2015-11-19 16:20:14 +00:00
|
|
|
test_expect_success 'commit with NUL in header' '
|
|
|
|
git cat-file commit HEAD >basis &&
|
|
|
|
sed "s/author ./author Q/" <basis | q_to_nul >commit-NUL-header &&
|
2023-01-18 20:41:56 +00:00
|
|
|
new=$(git hash-object --literally -t commit -w --stdin <commit-NUL-header) &&
|
2015-11-19 16:20:14 +00:00
|
|
|
test_when_finished "remove_object $new" &&
|
|
|
|
git update-ref refs/heads/bogus "$new" &&
|
|
|
|
test_when_finished "git update-ref -d refs/heads/bogus" &&
|
|
|
|
test_must_fail git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "error in commit $new.*unterminated header: NUL at offset" out
|
2015-11-19 16:20:14 +00:00
|
|
|
'
|
|
|
|
|
2016-09-27 20:59:51 +00:00
|
|
|
test_expect_success 'tree object with duplicate entries' '
|
2017-01-16 21:24:03 +00:00
|
|
|
test_when_finished "for i in \$T; do remove_object \$i; done" &&
|
2014-08-29 20:31:46 +00:00
|
|
|
T=$(
|
|
|
|
GIT_INDEX_FILE=test-index &&
|
|
|
|
export GIT_INDEX_FILE &&
|
|
|
|
rm -f test-index &&
|
|
|
|
>x &&
|
|
|
|
git add x &&
|
2017-01-16 21:24:03 +00:00
|
|
|
git rev-parse :x &&
|
2014-08-29 20:31:46 +00:00
|
|
|
T=$(git write-tree) &&
|
2017-01-16 21:24:03 +00:00
|
|
|
echo $T &&
|
2014-08-29 20:31:46 +00:00
|
|
|
(
|
|
|
|
git cat-file tree $T &&
|
|
|
|
git cat-file tree $T
|
|
|
|
) |
|
2023-01-18 20:41:56 +00:00
|
|
|
git hash-object --literally -w -t tree --stdin
|
2014-08-29 20:31:46 +00:00
|
|
|
) &&
|
|
|
|
test_must_fail git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "error in tree .*contains duplicate file entries" out
|
2014-08-29 20:31:46 +00:00
|
|
|
'
|
|
|
|
|
2020-05-21 09:52:28 +00:00
|
|
|
check_duplicate_names () {
|
|
|
|
expect=$1 &&
|
|
|
|
shift &&
|
|
|
|
names=$@ &&
|
|
|
|
test_expect_$expect "tree object with duplicate names: $names" '
|
|
|
|
test_when_finished "remove_object \$blob" &&
|
|
|
|
test_when_finished "remove_object \$tree" &&
|
|
|
|
test_when_finished "remove_object \$badtree" &&
|
|
|
|
blob=$(echo blob | git hash-object -w --stdin) &&
|
|
|
|
printf "100644 blob %s\t%s\n" $blob x.2 >tree &&
|
|
|
|
tree=$(git mktree <tree) &&
|
|
|
|
for name in $names
|
|
|
|
do
|
|
|
|
case "$name" in
|
|
|
|
*/) printf "040000 tree %s\t%s\n" $tree "${name%/}" ;;
|
|
|
|
*) printf "100644 blob %s\t%s\n" $blob "$name" ;;
|
|
|
|
esac
|
|
|
|
done >badtree &&
|
|
|
|
badtree=$(git mktree <badtree) &&
|
|
|
|
test_must_fail git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "$badtree" out &&
|
|
|
|
test_grep "error in tree .*contains duplicate file entries" out
|
2020-05-21 09:52:28 +00:00
|
|
|
'
|
|
|
|
}
|
|
|
|
|
|
|
|
check_duplicate_names success x x.1 x/
|
|
|
|
check_duplicate_names success x x.1.2 x.1/ x/
|
2020-05-21 09:52:54 +00:00
|
|
|
check_duplicate_names success x x.1 x.1.2 x/
|
2020-05-10 16:12:16 +00:00
|
|
|
|
2016-09-27 20:59:51 +00:00
|
|
|
test_expect_success 'unparseable tree object' '
|
2019-06-28 22:59:21 +00:00
|
|
|
test_oid_cache <<-\EOF &&
|
|
|
|
junk sha1:twenty-bytes-of-junk
|
|
|
|
junk sha256:twenty-bytes-of-junk-twelve-more
|
|
|
|
EOF
|
|
|
|
|
2016-09-27 20:59:51 +00:00
|
|
|
test_when_finished "git update-ref -d refs/heads/wrong" &&
|
|
|
|
test_when_finished "remove_object \$tree_sha1" &&
|
|
|
|
test_when_finished "remove_object \$commit_sha1" &&
|
2019-06-28 22:59:21 +00:00
|
|
|
junk=$(test_oid junk) &&
|
|
|
|
tree_sha1=$(printf "100644 \0$junk" | git hash-object -t tree --stdin -w --literally) &&
|
2016-09-27 20:59:51 +00:00
|
|
|
commit_sha1=$(git commit-tree $tree_sha1) &&
|
|
|
|
git update-ref refs/heads/wrong $commit_sha1 &&
|
|
|
|
test_must_fail git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "error: empty filename in tree entry" out &&
|
|
|
|
test_grep "$tree_sha1" out &&
|
|
|
|
test_grep ! "fatal: empty filename in tree entry" out
|
2016-09-27 20:59:51 +00:00
|
|
|
'
|
|
|
|
|
2017-10-05 19:41:26 +00:00
|
|
|
test_expect_success 'tree entry with type mismatch' '
|
|
|
|
test_when_finished "remove_object \$blob" &&
|
|
|
|
test_when_finished "remove_object \$tree" &&
|
|
|
|
test_when_finished "remove_object \$commit" &&
|
|
|
|
test_when_finished "git update-ref -d refs/heads/type_mismatch" &&
|
|
|
|
blob=$(echo blob | git hash-object -w --stdin) &&
|
|
|
|
blob_bin=$(echo $blob | hex2oct) &&
|
|
|
|
tree=$(
|
|
|
|
printf "40000 dir\0${blob_bin}100644 file\0${blob_bin}" |
|
|
|
|
git hash-object -t tree --stdin -w --literally
|
|
|
|
) &&
|
|
|
|
commit=$(git commit-tree $tree) &&
|
|
|
|
git update-ref refs/heads/type_mismatch $commit &&
|
|
|
|
test_must_fail git fsck >out 2>&1 &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "is a blob, not a tree" out &&
|
|
|
|
test_grep ! "dangling blob" out
|
2017-10-05 19:41:26 +00:00
|
|
|
'
|
|
|
|
|
2022-08-10 21:02:45 +00:00
|
|
|
test_expect_success 'tree entry with bogus mode' '
|
|
|
|
test_when_finished "remove_object \$blob" &&
|
|
|
|
test_when_finished "remove_object \$tree" &&
|
|
|
|
blob=$(echo blob | git hash-object -w --stdin) &&
|
|
|
|
blob_oct=$(echo $blob | hex2oct) &&
|
|
|
|
tree=$(printf "100000 foo\0${blob_oct}" |
|
|
|
|
git hash-object -t tree --stdin -w --literally) &&
|
|
|
|
git fsck 2>err &&
|
|
|
|
cat >expect <<-EOF &&
|
|
|
|
warning in tree $tree: badFilemode: contains bad file modes
|
|
|
|
EOF
|
|
|
|
test_cmp expect err
|
|
|
|
'
|
|
|
|
|
2010-02-20 00:18:44 +00:00
|
|
|
test_expect_success 'tag pointing to nonexistent' '
|
2019-06-28 22:59:21 +00:00
|
|
|
badoid=$(test_oid deadbeef) &&
|
|
|
|
cat >invalid-tag <<-EOF &&
|
|
|
|
object $badoid
|
2010-09-07 01:47:07 +00:00
|
|
|
type commit
|
|
|
|
tag invalid
|
|
|
|
tagger T A Gger <tagger@example.com> 1234567890 -0000
|
|
|
|
|
|
|
|
This is an invalid tag.
|
|
|
|
EOF
|
|
|
|
|
|
|
|
tag=$(git hash-object -t tag -w --stdin <invalid-tag) &&
|
|
|
|
test_when_finished "remove_object $tag" &&
|
|
|
|
echo $tag >.git/refs/tags/invalid &&
|
|
|
|
test_when_finished "git update-ref -d refs/tags/invalid" &&
|
2010-02-20 00:18:44 +00:00
|
|
|
test_must_fail git fsck --tags >out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "broken link" out
|
2009-02-19 11:13:39 +00:00
|
|
|
'
|
|
|
|
|
2010-02-20 00:18:44 +00:00
|
|
|
test_expect_success 'tag pointing to something else than its type' '
|
2010-09-07 01:47:07 +00:00
|
|
|
sha=$(echo blob | git hash-object -w --stdin) &&
|
|
|
|
test_when_finished "remove_object $sha" &&
|
|
|
|
cat >wrong-tag <<-EOF &&
|
|
|
|
object $sha
|
|
|
|
type commit
|
|
|
|
tag wrong
|
|
|
|
tagger T A Gger <tagger@example.com> 1234567890 -0000
|
|
|
|
|
|
|
|
This is an invalid tag.
|
|
|
|
EOF
|
|
|
|
|
|
|
|
tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
|
|
|
|
test_when_finished "remove_object $tag" &&
|
|
|
|
echo $tag >.git/refs/tags/wrong &&
|
|
|
|
test_when_finished "git update-ref -d refs/tags/wrong" &&
|
t1450: the order the objects are checked is undefined
When a tag T points at an object X that is of a type that is
different from what the tag records as, fsck should report it as an
error.
However, depending on the order X and T are checked individually,
the actual error message can be different. If X is checked first,
fsck remembers X's type and then when it checks T, it notices that T
records X as a wrong type (i.e. the complaint is about a broken tag
T). If T is checked first, on the other hand, fsck remembers that we
need to verify X is of the type tag records, and when it later
checks X, it notices that X is of a wrong type (i.e. the complaint
is about a broken object X).
The important thing is that fsck notices such an error and diagnoses
the issue on object X, but the test was expecting that we happen to
check objects in the order to make us detect issues with tag T, not
with object X. Remove this unwarranted assumption.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-02 22:08:16 +00:00
|
|
|
test_must_fail git fsck --tags
|
2009-02-19 11:13:39 +00:00
|
|
|
'
|
|
|
|
|
2014-09-11 14:26:41 +00:00
|
|
|
test_expect_success 'tag with incorrect tag name & missing tagger' '
|
|
|
|
sha=$(git rev-parse HEAD) &&
|
|
|
|
cat >wrong-tag <<-EOF &&
|
|
|
|
object $sha
|
|
|
|
type commit
|
|
|
|
tag wrong name format
|
|
|
|
|
|
|
|
This is an invalid tag.
|
|
|
|
EOF
|
|
|
|
|
2023-01-18 20:41:56 +00:00
|
|
|
tag=$(git hash-object --literally -t tag -w --stdin <wrong-tag) &&
|
2014-09-11 14:26:41 +00:00
|
|
|
test_when_finished "remove_object $tag" &&
|
|
|
|
echo $tag >.git/refs/tags/wrong &&
|
|
|
|
test_when_finished "git update-ref -d refs/tags/wrong" &&
|
|
|
|
git fsck --tags 2>out &&
|
2014-12-08 05:48:13 +00:00
|
|
|
|
|
|
|
cat >expect <<-EOF &&
|
2015-06-22 15:25:52 +00:00
|
|
|
warning in tag $tag: badTagName: invalid '\''tag'\'' name: wrong name format
|
|
|
|
warning in tag $tag: missingTaggerEntry: invalid format - expected '\''tagger'\'' line
|
2014-12-08 05:48:13 +00:00
|
|
|
EOF
|
2021-02-11 01:53:53 +00:00
|
|
|
test_cmp expect out
|
2014-09-11 14:26:41 +00:00
|
|
|
'
|
|
|
|
|
2014-09-11 21:16:36 +00:00
|
|
|
test_expect_success 'tag with bad tagger' '
|
|
|
|
sha=$(git rev-parse HEAD) &&
|
|
|
|
cat >wrong-tag <<-EOF &&
|
|
|
|
object $sha
|
|
|
|
type commit
|
|
|
|
tag not-quite-wrong
|
|
|
|
tagger Bad Tagger Name
|
|
|
|
|
|
|
|
This is an invalid tag.
|
|
|
|
EOF
|
|
|
|
|
|
|
|
tag=$(git hash-object --literally -t tag -w --stdin <wrong-tag) &&
|
|
|
|
test_when_finished "remove_object $tag" &&
|
|
|
|
echo $tag >.git/refs/tags/wrong &&
|
|
|
|
test_when_finished "git update-ref -d refs/tags/wrong" &&
|
|
|
|
test_must_fail git fsck --tags 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "error in tag .*: invalid author/committer" out
|
2014-09-11 21:16:36 +00:00
|
|
|
'
|
|
|
|
|
2015-11-19 16:25:31 +00:00
|
|
|
test_expect_success 'tag with NUL in header' '
|
2015-11-19 16:20:14 +00:00
|
|
|
sha=$(git rev-parse HEAD) &&
|
|
|
|
q_to_nul >tag-NUL-header <<-EOF &&
|
|
|
|
object $sha
|
|
|
|
type commit
|
|
|
|
tag contains-Q-in-header
|
|
|
|
tagger T A Gger <tagger@example.com> 1234567890 -0000
|
|
|
|
|
|
|
|
This is an invalid tag.
|
|
|
|
EOF
|
|
|
|
|
|
|
|
tag=$(git hash-object --literally -t tag -w --stdin <tag-NUL-header) &&
|
|
|
|
test_when_finished "remove_object $tag" &&
|
|
|
|
echo $tag >.git/refs/tags/wrong &&
|
|
|
|
test_when_finished "git update-ref -d refs/tags/wrong" &&
|
|
|
|
test_must_fail git fsck --tags 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "error in tag $tag.*unterminated header: NUL at offset" out
|
2015-11-19 16:20:14 +00:00
|
|
|
'
|
|
|
|
|
2010-09-07 01:47:07 +00:00
|
|
|
test_expect_success 'cleaned up' '
|
|
|
|
git fsck >actual 2>&1 &&
|
tests: use 'test_must_be_empty' instead of 'test_cmp <empty> <out>'
Using 'test_must_be_empty' is shorter and more idiomatic than
>empty &&
test_cmp empty out
as it saves the creation of an empty file. Furthermore, sometimes the
expected empty file doesn't have such a descriptive name like 'empty',
and its creation is far away from the place where it's finally used
for comparison (e.g. in 't7600-merge.sh', where two expected empty
files are created in the 'setup' test, but are used only about 500
lines later).
These cases were found by instrumenting 'test_cmp' to error out the
test script when it's used to compare empty files, and then converted
manually.
Note that even after this patch there still remain a lot of cases
where we use 'test_cmp' to check empty files:
- Sometimes the expected output is not hard-coded in the test, but
'test_cmp' is used to ensure that two similar git commands produce
the same output, and that output happens to be empty, e.g. the
test 'submodule update --merge - ignores --merge for new
submodules' in 't7406-submodule-update.sh'.
- Repetitive common tasks, including preparing the expected results
and running 'test_cmp', are often extracted into a helper
function, and some of this helper's callsites expect no output.
- For the same reason as above, the whole 'test_expect_success'
block is within a helper function, e.g. in 't3070-wildmatch.sh'.
- Or 'test_cmp' is invoked in a loop, e.g. the test 'cvs update
(-p)' in 't9400-git-cvsserver-server.sh'.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-19 21:57:25 +00:00
|
|
|
test_must_be_empty actual
|
2010-09-07 01:47:07 +00:00
|
|
|
'
|
2009-02-19 11:13:39 +00:00
|
|
|
|
2012-02-13 20:17:11 +00:00
|
|
|
test_expect_success 'rev-list --verify-objects' '
|
|
|
|
git rev-list --verify-objects --all >/dev/null 2>out &&
|
tests: use 'test_must_be_empty' instead of 'test_cmp <empty> <out>'
Using 'test_must_be_empty' is shorter and more idiomatic than
>empty &&
test_cmp empty out
as it saves the creation of an empty file. Furthermore, sometimes the
expected empty file doesn't have such a descriptive name like 'empty',
and its creation is far away from the place where it's finally used
for comparison (e.g. in 't7600-merge.sh', where two expected empty
files are created in the 'setup' test, but are used only about 500
lines later).
These cases were found by instrumenting 'test_cmp' to error out the
test script when it's used to compare empty files, and then converted
manually.
Note that even after this patch there still remain a lot of cases
where we use 'test_cmp' to check empty files:
- Sometimes the expected output is not hard-coded in the test, but
'test_cmp' is used to ensure that two similar git commands produce
the same output, and that output happens to be empty, e.g. the
test 'submodule update --merge - ignores --merge for new
submodules' in 't7406-submodule-update.sh'.
- Repetitive common tasks, including preparing the expected results
and running 'test_cmp', are often extracted into a helper
function, and some of this helper's callsites expect no output.
- For the same reason as above, the whole 'test_expect_success'
block is within a helper function, e.g. in 't3070-wildmatch.sh'.
- Or 'test_cmp' is invoked in a loop, e.g. the test 'cvs update
(-p)' in 't9400-git-cvsserver-server.sh'.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-19 21:57:25 +00:00
|
|
|
test_must_be_empty out
|
2012-02-13 20:17:11 +00:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'rev-list --verify-objects with bad sha1' '
|
|
|
|
sha=$(echo blob | git hash-object -w --stdin) &&
|
2019-06-28 22:59:21 +00:00
|
|
|
old=$(test_oid_to_path $sha) &&
|
|
|
|
new=$(dirname $old)/$(test_oid ff_2) &&
|
2012-02-13 20:17:11 +00:00
|
|
|
sha="$(dirname $new)$(basename $new)" &&
|
|
|
|
mv .git/objects/$old .git/objects/$new &&
|
|
|
|
test_when_finished "remove_object $sha" &&
|
|
|
|
git update-index --add --cacheinfo 100644 $sha foo &&
|
|
|
|
test_when_finished "git read-tree -u --reset HEAD" &&
|
|
|
|
tree=$(git write-tree) &&
|
|
|
|
test_when_finished "remove_object $tree" &&
|
|
|
|
cmt=$(echo bogus | git commit-tree $tree) &&
|
|
|
|
test_when_finished "remove_object $cmt" &&
|
|
|
|
git update-ref refs/heads/bogus $cmt &&
|
|
|
|
test_when_finished "git update-ref -d refs/heads/bogus" &&
|
|
|
|
|
|
|
|
test_might_fail git rev-list --verify-objects refs/heads/bogus >/dev/null 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep -q "error: hash mismatch $(dirname $new)$(test_oid ff_2)" out
|
2012-02-13 20:17:11 +00:00
|
|
|
'
|
|
|
|
|
upload-pack: skip parse-object re-hashing of "want" objects
Imagine we have a history with commit C pointing to a large blob B.
If a client asks us for C, we can generally serve both objects to them
without accessing the uncompressed contents of B. In upload-pack, we
figure out which commits we have and what the client has, and feed those
tips to pack-objects. In pack-objects, we traverse the commits and trees
(or use bitmaps!) to find the set of objects needed, but we never open
up B. When we serve it to the client, we can often pass the compressed
bytes directly from the on-disk packfile over the wire.
But if a client asks us directly for B, perhaps because they are doing
an on-demand fetch to fill in the missing blob of a partial clone, we
end up much slower. Upload-pack calls parse_object() on the oid we
receive, which opens up the object and re-checks its hash (even though
if it were a commit, we might skip this parse entirely in favor of the
commit graph!). And then we feed the oid directly to pack-objects, which
again calls parse_object() and opens the object. And then finally, when
we write out the result, we may send bytes straight from disk, but only
after having unnecessarily uncompressed and computed the sha1 of the
object twice!
This patch teaches both code paths to use the new SKIP_HASH_CHECK flag
for parse_object(). You can see the speed-up in p5600, which does a
blob:none clone followed by a checkout. The savings for git.git are
modest:
Test HEAD^ HEAD
----------------------------------------------------------------------
5600.3: checkout of result 2.23(4.19+0.24) 1.72(3.79+0.18) -22.9%
But the savings scale with the number of bytes. So on a repository like
linux.git with more files, we see more improvement (in both absolute and
relative numbers):
Test HEAD^ HEAD
----------------------------------------------------------------------------
5600.3: checkout of result 51.62(77.26+2.76) 34.86(61.41+2.63) -32.5%
And here's an even more extreme case. This is the android gradle-plugin
repository, whose tip checkout has ~3.7GB of files:
Test HEAD^ HEAD
--------------------------------------------------------------------------
5600.3: checkout of result 79.51(90.84+5.55) 40.28(51.88+5.67) -49.3%
Keep in mind that these timings are of the whole checkout operation. So
they count the client indexing the pack and actually writing out the
files. If we want to see just the server's view, we can hack up the
GIT_TRACE_PACKET output from those operations and replay it via
upload-pack. For the gradle example, that gives me:
Benchmark 1: GIT_PROTOCOL=version=2 git.old upload-pack ../gradle-plugin <input
Time (mean ± σ): 50.884 s ± 0.239 s [User: 51.450 s, System: 1.726 s]
Range (min … max): 50.608 s … 51.025 s 3 runs
Benchmark 2: GIT_PROTOCOL=version=2 git.new upload-pack ../gradle-plugin <input
Time (mean ± σ): 9.728 s ± 0.112 s [User: 10.466 s, System: 1.535 s]
Range (min … max): 9.618 s … 9.842 s 3 runs
Summary
'GIT_PROTOCOL=version=2 git.new upload-pack ../gradle-plugin <input' ran
5.23 ± 0.07 times faster than 'GIT_PROTOCOL=version=2 git.old upload-pack ../gradle-plugin <input'
So a server would see an 80% reduction in CPU serving the initial
checkout of a partial clone for this repository. Or possibly even more
depending on the packing; most of the time spent in the faster one were
objects we had to open during the write phase.
In both cases skipping the extra hashing on the server should be pretty
safe. The client doesn't trust the server anyway, so it will re-hash all
of the objects via index-pack. There is one thing to note, though: the
change in get_reference() affects not just pack-objects, but rev-list,
git-log, etc. We could use a flag to limit to index-pack here, but we
may already skip hash checks in this instance. For commits, we'd skip
anything we load via the commit-graph. And while before this commit we
would check a blob fed directly to rev-list on the command-line, we'd
skip checking that same blob if we found it by traversing a tree.
The exception for both is if --verify-objects is used. In that case,
we'll skip this optimization, and the new test makes sure we do this
correctly.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-06 23:05:42 +00:00
|
|
|
# An actual bit corruption is more likely than swapped commits, but
|
|
|
|
# this provides an easy way to have commits which don't match their purported
|
|
|
|
# hashes, but which aren't so broken we can't read them at all.
|
|
|
|
test_expect_success 'rev-list --verify-objects notices swapped commits' '
|
|
|
|
git init swapped-commits &&
|
|
|
|
(
|
|
|
|
cd swapped-commits &&
|
|
|
|
test_commit one &&
|
|
|
|
test_commit two &&
|
|
|
|
one_oid=$(git rev-parse HEAD) &&
|
|
|
|
two_oid=$(git rev-parse HEAD^) &&
|
|
|
|
one=.git/objects/$(test_oid_to_path $one_oid) &&
|
|
|
|
two=.git/objects/$(test_oid_to_path $two_oid) &&
|
|
|
|
mv $one tmp &&
|
|
|
|
mv $two $one &&
|
|
|
|
mv tmp $two &&
|
|
|
|
test_must_fail git rev-list --verify-objects HEAD
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
rev-list: disable commit graph with --verify-objects
Since the point of --verify-objects is to actually load and checksum the
bytes of each object, optimizing out reads using the commit graph runs
contrary to our goal.
The most targeted way to implement this would be for the revision
traversal code to check revs->verify_objects and avoid using the commit
graph. But it's difficult to be sure we've hit all of the correct spots.
For instance, I started this patch by writing the first of the included
test cases, where the corrupted commit is directly on rev-list's command
line. And that is easy to fix by teaching get_reference() to check
revs->verify_objects before calling lookup_commit_in_graph().
But that doesn't cover the second test case: when we traverse to a
corrupted commit, we'd parse the parent in process_parents(). So we'd
need to check there, too. And it keeps going. In handle_commit() we
sometimes parses commits, too, though I couldn't figure out a way to
trigger it that did not already parse via get_reference() or tag
peeling. And try_to_simplify_commit() has its own parse call, and so on.
So it seems like the safest thing is to just disable the commit graph
for the whole process when we see the --verify-objects option. We can do
that either in builtin/rev-list.c, where we use the option, or in
revision.c, where we parse it. There are some subtleties:
- putting it in rev-list.c is less surprising in some ways, because
there we know we are just doing a single traversal. In a command
which does multiple traversals in a single process, it's rather
unexpected to globally disable the commit graph.
- putting it in revision.c is less surprising in some ways, because
the caller does not have to remember to disable the graph
themselves. But this is already tricky! The verify_objects flag in
rev_info doesn't do anything by itself. The caller has to provide an
object callback which does the right thing.
- for that reason, in practice nobody but rev-list uses this option in
the first place. So the distinction is probably not important either
way. Arguably it should just be an option of rev-list, and not the
general revision machinery; right now you can run "git log
--verify-objects", but it does not actually do anything useful.
- checking for a parsed revs.verify_objects flag in rev-list.c is too
late. By that time we've already passed the arguments to
setup_revisions(), which will have parsed the commits using the
graph.
So this commit disables the graph as soon as we see the option in
revision.c. That's a pretty broad hammer, but it does what we want, and
in practice nobody but rev-list is using this flag anyway.
The tests cover both the "tip" and "parent" cases. Obviously our hammer
hits them both in this case, but it's good to check both in case
somebody later tries the more focused approach.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-06 21:04:35 +00:00
|
|
|
test_expect_success 'set up repository with commit-graph' '
|
|
|
|
git init corrupt-graph &&
|
|
|
|
(
|
|
|
|
cd corrupt-graph &&
|
|
|
|
test_commit one &&
|
|
|
|
test_commit two &&
|
|
|
|
git commit-graph write --reachable
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
|
|
|
corrupt_graph_obj () {
|
|
|
|
oid=$(git -C corrupt-graph rev-parse "$1") &&
|
|
|
|
obj=corrupt-graph/.git/objects/$(test_oid_to_path $oid) &&
|
|
|
|
test_when_finished 'mv backup $obj' &&
|
|
|
|
mv $obj backup &&
|
|
|
|
echo garbage >$obj
|
|
|
|
}
|
|
|
|
|
|
|
|
test_expect_success 'rev-list --verify-objects with commit graph (tip)' '
|
|
|
|
corrupt_graph_obj HEAD &&
|
|
|
|
test_must_fail git -C corrupt-graph rev-list --verify-objects HEAD
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'rev-list --verify-objects with commit graph (parent)' '
|
|
|
|
corrupt_graph_obj HEAD^ &&
|
|
|
|
test_must_fail git -C corrupt-graph rev-list --verify-objects HEAD
|
|
|
|
'
|
|
|
|
|
2015-06-22 15:27:06 +00:00
|
|
|
test_expect_success 'force fsck to ignore double author' '
|
|
|
|
git cat-file commit HEAD >basis &&
|
|
|
|
sed "s/^author .*/&,&/" <basis | tr , \\n >multiple-authors &&
|
2023-01-18 20:41:56 +00:00
|
|
|
new=$(git hash-object --literally -t commit -w --stdin <multiple-authors) &&
|
2015-06-22 15:27:06 +00:00
|
|
|
test_when_finished "remove_object $new" &&
|
|
|
|
git update-ref refs/heads/bogus "$new" &&
|
|
|
|
test_when_finished "git update-ref -d refs/heads/bogus" &&
|
|
|
|
test_must_fail git fsck &&
|
|
|
|
git -c fsck.multipleAuthors=ignore fsck
|
|
|
|
'
|
|
|
|
|
2012-07-28 15:06:29 +00:00
|
|
|
_bz='\0'
|
2019-06-28 22:59:21 +00:00
|
|
|
_bzoid=$(printf $ZERO_OID | sed -e 's/00/\\0/g')
|
2012-07-28 15:06:29 +00:00
|
|
|
|
|
|
|
test_expect_success 'fsck notices blob entry pointing to null sha1' '
|
|
|
|
(git init null-blob &&
|
|
|
|
cd null-blob &&
|
2019-06-28 22:59:21 +00:00
|
|
|
sha=$(printf "100644 file$_bz$_bzoid" |
|
2023-01-18 20:41:56 +00:00
|
|
|
git hash-object --literally -w --stdin -t tree) &&
|
2012-07-28 15:06:29 +00:00
|
|
|
git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "warning.*null sha1" out
|
2012-07-28 15:06:29 +00:00
|
|
|
)
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'fsck notices submodule entry pointing to null sha1' '
|
|
|
|
(git init null-commit &&
|
|
|
|
cd null-commit &&
|
2019-06-28 22:59:21 +00:00
|
|
|
sha=$(printf "160000 submodule$_bz$_bzoid" |
|
2023-01-18 20:41:56 +00:00
|
|
|
git hash-object --literally -w --stdin -t tree) &&
|
2012-07-28 15:06:29 +00:00
|
|
|
git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "warning.*null sha1" out
|
2012-07-28 15:06:29 +00:00
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2023-08-31 06:20:01 +00:00
|
|
|
test_expect_success 'fsck notices excessively large tree entry name' '
|
|
|
|
git init large-name &&
|
|
|
|
(
|
|
|
|
cd large-name &&
|
|
|
|
test_commit a-long-name &&
|
|
|
|
git -c fsck.largePathname=warn:10 fsck 2>out &&
|
|
|
|
grep "warning.*large pathname" out
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
fsck: complain about HFS+ ".git" aliases in trees
Now that the index can block pathnames that case-fold to
".git" on HFS+, it would be helpful for fsck to notice such
problematic paths. This lets servers which use
receive.fsckObjects block them before the damage spreads.
Note that the fsck check is always on, even for systems
without core.protectHFS set. This is technically more
restrictive than we need to be, as a set of users on ext4
could happily use these odd filenames without caring about
HFS+.
However, on balance, it's helpful for all servers to block
these (because the paths can be used for mischief, and
servers which bother to fsck would want to stop the spread
whether they are on HFS+ themselves or not), and hardly
anybody will be affected (because the blocked names are
variants of .git with invisible Unicode code-points mixed
in, meaning mischief is almost certainly what the tree
author had in mind).
Ideally these would be controlled by a separate
"fsck.protectHFS" flag. However, it would be much nicer to
be able to enable/disable _any_ fsck flag individually, and
any scheme we choose should match such a system. Given the
likelihood of anybody using such a path in practice, it is
not unreasonable to wait until such a system materializes.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-15 23:21:57 +00:00
|
|
|
while read name path pretty; do
|
2014-11-24 18:40:11 +00:00
|
|
|
while read mode type; do
|
fsck: complain about HFS+ ".git" aliases in trees
Now that the index can block pathnames that case-fold to
".git" on HFS+, it would be helpful for fsck to notice such
problematic paths. This lets servers which use
receive.fsckObjects block them before the damage spreads.
Note that the fsck check is always on, even for systems
without core.protectHFS set. This is technically more
restrictive than we need to be, as a set of users on ext4
could happily use these odd filenames without caring about
HFS+.
However, on balance, it's helpful for all servers to block
these (because the paths can be used for mischief, and
servers which bother to fsck would want to stop the spread
whether they are on HFS+ themselves or not), and hardly
anybody will be affected (because the blocked names are
variants of .git with invisible Unicode code-points mixed
in, meaning mischief is almost certainly what the tree
author had in mind).
Ideally these would be controlled by a separate
"fsck.protectHFS" flag. However, it would be much nicer to
be able to enable/disable _any_ fsck flag individually, and
any scheme we choose should match such a system. Given the
likelihood of anybody using such a path in practice, it is
not unreasonable to wait until such a system materializes.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-15 23:21:57 +00:00
|
|
|
: ${pretty:=$path}
|
|
|
|
test_expect_success "fsck notices $pretty as $type" '
|
2014-11-24 18:40:11 +00:00
|
|
|
(
|
|
|
|
git init $name-$type &&
|
|
|
|
cd $name-$type &&
|
2019-09-12 12:54:05 +00:00
|
|
|
git config core.protectNTFS false &&
|
2014-11-24 18:40:11 +00:00
|
|
|
echo content >file &&
|
|
|
|
git add file &&
|
|
|
|
git commit -m base &&
|
|
|
|
blob=$(git rev-parse :file) &&
|
|
|
|
tree=$(git rev-parse HEAD^{tree}) &&
|
|
|
|
value=$(eval "echo \$$type") &&
|
|
|
|
printf "$mode $type %s\t%s" "$value" "$path" >bad &&
|
fsck: complain about NTFS ".git" aliases in trees
Now that the index can block pathnames that can be mistaken
to mean ".git" on NTFS and FAT32, it would be helpful for
fsck to notice such problematic paths. This lets servers
which use receive.fsckObjects block them before the damage
spreads.
Note that the fsck check is always on, even for systems
without core.protectNTFS set. This is technically more
restrictive than we need to be, as a set of users on ext4
could happily use these odd filenames without caring about
NTFS.
However, on balance, it's helpful for all servers to block
these (because the paths can be used for mischief, and
servers which bother to fsck would want to stop the spread
whether they are on NTFS themselves or not), and hardly
anybody will be affected (because the blocked names are
variants of .git or git~1, meaning mischief is almost
certainly what the tree author had in mind).
Ideally these would be controlled by a separate
"fsck.protectNTFS" flag. However, it would be much nicer to
be able to enable/disable _any_ fsck flag individually, and
any scheme we choose should match such a system. Given the
likelihood of anybody using such a path in practice, it is
not unreasonable to wait until such a system materializes.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-10 21:28:27 +00:00
|
|
|
bad_tree=$(git mktree <bad) &&
|
2014-11-24 18:40:11 +00:00
|
|
|
git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "warning.*tree $bad_tree" out
|
2014-11-24 18:40:11 +00:00
|
|
|
)'
|
|
|
|
done <<-\EOF
|
|
|
|
100644 blob
|
|
|
|
040000 tree
|
|
|
|
EOF
|
fsck: complain about HFS+ ".git" aliases in trees
Now that the index can block pathnames that case-fold to
".git" on HFS+, it would be helpful for fsck to notice such
problematic paths. This lets servers which use
receive.fsckObjects block them before the damage spreads.
Note that the fsck check is always on, even for systems
without core.protectHFS set. This is technically more
restrictive than we need to be, as a set of users on ext4
could happily use these odd filenames without caring about
HFS+.
However, on balance, it's helpful for all servers to block
these (because the paths can be used for mischief, and
servers which bother to fsck would want to stop the spread
whether they are on HFS+ themselves or not), and hardly
anybody will be affected (because the blocked names are
variants of .git with invisible Unicode code-points mixed
in, meaning mischief is almost certainly what the tree
author had in mind).
Ideally these would be controlled by a separate
"fsck.protectHFS" flag. However, it would be much nicer to
be able to enable/disable _any_ fsck flag individually, and
any scheme we choose should match such a system. Given the
likelihood of anybody using such a path in practice, it is
not unreasonable to wait until such a system materializes.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-15 23:21:57 +00:00
|
|
|
done <<-EOF
|
2014-11-24 18:40:11 +00:00
|
|
|
dot .
|
|
|
|
dotdot ..
|
|
|
|
dotgit .git
|
2014-11-24 18:40:44 +00:00
|
|
|
dotgit-case .GIT
|
fsck: complain about HFS+ ".git" aliases in trees
Now that the index can block pathnames that case-fold to
".git" on HFS+, it would be helpful for fsck to notice such
problematic paths. This lets servers which use
receive.fsckObjects block them before the damage spreads.
Note that the fsck check is always on, even for systems
without core.protectHFS set. This is technically more
restrictive than we need to be, as a set of users on ext4
could happily use these odd filenames without caring about
HFS+.
However, on balance, it's helpful for all servers to block
these (because the paths can be used for mischief, and
servers which bother to fsck would want to stop the spread
whether they are on HFS+ themselves or not), and hardly
anybody will be affected (because the blocked names are
variants of .git with invisible Unicode code-points mixed
in, meaning mischief is almost certainly what the tree
author had in mind).
Ideally these would be controlled by a separate
"fsck.protectHFS" flag. However, it would be much nicer to
be able to enable/disable _any_ fsck flag individually, and
any scheme we choose should match such a system. Given the
likelihood of anybody using such a path in practice, it is
not unreasonable to wait until such a system materializes.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-15 23:21:57 +00:00
|
|
|
dotgit-unicode .gI${u200c}T .gI{u200c}T
|
fsck: complain about NTFS ".git" aliases in trees
Now that the index can block pathnames that can be mistaken
to mean ".git" on NTFS and FAT32, it would be helpful for
fsck to notice such problematic paths. This lets servers
which use receive.fsckObjects block them before the damage
spreads.
Note that the fsck check is always on, even for systems
without core.protectNTFS set. This is technically more
restrictive than we need to be, as a set of users on ext4
could happily use these odd filenames without caring about
NTFS.
However, on balance, it's helpful for all servers to block
these (because the paths can be used for mischief, and
servers which bother to fsck would want to stop the spread
whether they are on NTFS themselves or not), and hardly
anybody will be affected (because the blocked names are
variants of .git or git~1, meaning mischief is almost
certainly what the tree author had in mind).
Ideally these would be controlled by a separate
"fsck.protectNTFS" flag. However, it would be much nicer to
be able to enable/disable _any_ fsck flag individually, and
any scheme we choose should match such a system. Given the
likelihood of anybody using such a path in practice, it is
not unreasonable to wait until such a system materializes.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-10 21:28:27 +00:00
|
|
|
dotgit-case2 .Git
|
|
|
|
git-tilde1 git~1
|
|
|
|
dotgitdot .git.
|
|
|
|
dot-backslash-case .\\\\.GIT\\\\foobar
|
|
|
|
dotgit-case-backslash .git\\\\foobar
|
2014-11-24 18:40:11 +00:00
|
|
|
EOF
|
2012-11-28 21:35:29 +00:00
|
|
|
|
2014-12-23 08:45:36 +00:00
|
|
|
test_expect_success 'fsck allows .Ňit' '
|
|
|
|
(
|
|
|
|
git init not-dotgit &&
|
|
|
|
cd not-dotgit &&
|
|
|
|
echo content >file &&
|
|
|
|
git add file &&
|
|
|
|
git commit -m base &&
|
|
|
|
blob=$(git rev-parse :file) &&
|
|
|
|
printf "100644 blob $blob\t.\\305\\207it" >tree &&
|
|
|
|
tree=$(git mktree <tree) &&
|
|
|
|
git fsck 2>err &&
|
|
|
|
test_line_count = 0 err
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2016-04-14 17:58:22 +00:00
|
|
|
test_expect_success 'NUL in commit' '
|
|
|
|
rm -fr nul-in-commit &&
|
|
|
|
git init nul-in-commit &&
|
|
|
|
(
|
|
|
|
cd nul-in-commit &&
|
|
|
|
git commit --allow-empty -m "initial commitQNUL after message" &&
|
|
|
|
git cat-file commit HEAD >original &&
|
|
|
|
q_to_nul <original >munged &&
|
2023-01-18 20:41:56 +00:00
|
|
|
git hash-object --literally -w -t commit --stdin <munged >name &&
|
2016-04-14 17:58:22 +00:00
|
|
|
git branch bad $(cat name) &&
|
|
|
|
|
|
|
|
test_must_fail git -c fsck.nulInCommit=error fsck 2>warn.1 &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep nulInCommit warn.1 &&
|
2016-04-14 17:58:22 +00:00
|
|
|
git fsck 2>warn.2 &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep nulInCommit warn.2
|
2016-04-14 17:58:22 +00:00
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2014-09-12 03:38:30 +00:00
|
|
|
# create a static test repo which is broken by omitting
|
|
|
|
# one particular object ($1, which is looked up via rev-parse
|
|
|
|
# in the new repository).
|
|
|
|
create_repo_missing () {
|
|
|
|
rm -rf missing &&
|
|
|
|
git init missing &&
|
|
|
|
(
|
|
|
|
cd missing &&
|
|
|
|
git commit -m one --allow-empty &&
|
|
|
|
mkdir subdir &&
|
|
|
|
echo content >subdir/file &&
|
|
|
|
git add subdir/file &&
|
|
|
|
git commit -m two &&
|
|
|
|
unrelated=$(echo unrelated | git hash-object --stdin -w) &&
|
|
|
|
git tag -m foo tag $unrelated &&
|
|
|
|
sha1=$(git rev-parse --verify "$1") &&
|
|
|
|
path=$(echo $sha1 | sed 's|..|&/|') &&
|
|
|
|
rm .git/objects/$path
|
|
|
|
)
|
|
|
|
}
|
|
|
|
|
|
|
|
test_expect_success 'fsck notices missing blob' '
|
|
|
|
create_repo_missing HEAD:subdir/file &&
|
|
|
|
test_must_fail git -C missing fsck
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'fsck notices missing subtree' '
|
|
|
|
create_repo_missing HEAD:subdir &&
|
|
|
|
test_must_fail git -C missing fsck
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'fsck notices missing root tree' '
|
|
|
|
create_repo_missing HEAD^{tree} &&
|
|
|
|
test_must_fail git -C missing fsck
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'fsck notices missing parent' '
|
|
|
|
create_repo_missing HEAD^ &&
|
|
|
|
test_must_fail git -C missing fsck
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'fsck notices missing tagged object' '
|
|
|
|
create_repo_missing tag^{blob} &&
|
|
|
|
test_must_fail git -C missing fsck
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'fsck notices ref pointing to missing commit' '
|
|
|
|
create_repo_missing HEAD &&
|
|
|
|
test_must_fail git -C missing fsck
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'fsck notices ref pointing to missing tag' '
|
|
|
|
create_repo_missing tag &&
|
|
|
|
test_must_fail git -C missing fsck
|
|
|
|
'
|
|
|
|
|
2015-06-22 15:27:12 +00:00
|
|
|
test_expect_success 'fsck --connectivity-only' '
|
|
|
|
rm -rf connectivity-only &&
|
|
|
|
git init connectivity-only &&
|
|
|
|
(
|
|
|
|
cd connectivity-only &&
|
|
|
|
touch empty &&
|
|
|
|
git add empty &&
|
|
|
|
test_commit empty &&
|
fsck: prepare dummy objects for --connectivity-check
Normally fsck makes a pass over all objects to check their
integrity, and then follows up with a reachability check to
make sure we have all of the referenced objects (and to know
which ones are dangling). The latter checks for the HAS_OBJ
flag in obj->flags to see if we found the object in the
first pass.
Commit 02976bf85 (fsck: introduce `git fsck --connectivity-only`,
2015-06-22) taught fsck to skip the initial pass, and to
fallback to has_sha1_file() instead of the HAS_OBJ check.
However, it converted only one HAS_OBJ check to use
has_sha1_file(). But there are many other places in
builtin/fsck.c that assume that the flag is set (or that
lookup_object() will return an object at all). This leads to
several bugs with --connectivity-only:
1. mark_object() will not queue objects for examination,
so recursively following links from commits to trees,
etc, did nothing. I.e., we were checking the
reachability of hardly anything at all.
2. When a set of heads is given on the command-line, we
use lookup_object() to see if they exist. But without
the initial pass, we assume nothing exists.
3. When loading reflog entries, we do a similar
lookup_object() check, and complain that the reflog is
broken if the object doesn't exist in our hash.
So in short, --connectivity-only is broken pretty badly, and
will claim that your repository is fine when it's not.
Presumably nobody noticed for a few reasons.
One is that the embedded test does not actually test the
recursive nature of the reachability check. All of the
missing objects are still in the index, and we directly
check items from the index. This patch modifies the test to
delete the index, which shows off breakage (1).
Another is that --connectivity-only just skips the initial
pass for loose objects. So on a real repository, the packed
objects were still checked correctly. But on the flipside,
it means that "git fsck --connectivity-only" still checks
the sha1 of all of the packed objects, nullifying its
original purpose of being a faster git-fsck.
And of course the final problem is that the bug only shows
up when there _is_ corruption, which is rare. So anybody
running "git fsck --connectivity-only" proactively would
assume it was being thorough, when it was not.
One possibility for fixing this is to find all of the spots
that rely on HAS_OBJ and tweak them for the connectivity-only
case. But besides the risk that we might miss a spot (and I
found three already, corresponding to the three bugs above),
there are other parts of fsck that _can't_ work without a
full list of objects. E.g., the list of dangling objects.
Instead, let's make the connectivity-only case look more
like the normal case. Rather than skip the initial pass
completely, we'll do an abbreviated one that sets up the
HAS_OBJ flag for each object, without actually loading the
object data.
That's simple and fast, and we don't have to care about the
connectivity_only flag in the rest of the code at all.
While we're at it, let's make sure we treat loose and packed
objects the same (i.e., setting up dummy objects for both
and skipping the actual sha1 check). That makes the
connectivity-only check actually fast on a real repo (40
seconds versus 180 seconds on my copy of linux.git).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-17 21:32:57 +00:00
|
|
|
|
|
|
|
# Drop the index now; we want to be sure that we
|
|
|
|
# recursively notice the broken objects
|
|
|
|
# because they are reachable from refs, not because
|
|
|
|
# they are in the index.
|
|
|
|
rm -f .git/index &&
|
|
|
|
|
|
|
|
# corrupt the blob, but in a way that we can still identify
|
|
|
|
# its type. That lets us see that --connectivity-only is
|
|
|
|
# not actually looking at the contents, but leaves it
|
|
|
|
# free to examine the type if it chooses.
|
2019-06-28 22:59:21 +00:00
|
|
|
empty=.git/objects/$(test_oid_to_path $EMPTY_BLOB) &&
|
fsck: prepare dummy objects for --connectivity-check
Normally fsck makes a pass over all objects to check their
integrity, and then follows up with a reachability check to
make sure we have all of the referenced objects (and to know
which ones are dangling). The latter checks for the HAS_OBJ
flag in obj->flags to see if we found the object in the
first pass.
Commit 02976bf85 (fsck: introduce `git fsck --connectivity-only`,
2015-06-22) taught fsck to skip the initial pass, and to
fallback to has_sha1_file() instead of the HAS_OBJ check.
However, it converted only one HAS_OBJ check to use
has_sha1_file(). But there are many other places in
builtin/fsck.c that assume that the flag is set (or that
lookup_object() will return an object at all). This leads to
several bugs with --connectivity-only:
1. mark_object() will not queue objects for examination,
so recursively following links from commits to trees,
etc, did nothing. I.e., we were checking the
reachability of hardly anything at all.
2. When a set of heads is given on the command-line, we
use lookup_object() to see if they exist. But without
the initial pass, we assume nothing exists.
3. When loading reflog entries, we do a similar
lookup_object() check, and complain that the reflog is
broken if the object doesn't exist in our hash.
So in short, --connectivity-only is broken pretty badly, and
will claim that your repository is fine when it's not.
Presumably nobody noticed for a few reasons.
One is that the embedded test does not actually test the
recursive nature of the reachability check. All of the
missing objects are still in the index, and we directly
check items from the index. This patch modifies the test to
delete the index, which shows off breakage (1).
Another is that --connectivity-only just skips the initial
pass for loose objects. So on a real repository, the packed
objects were still checked correctly. But on the flipside,
it means that "git fsck --connectivity-only" still checks
the sha1 of all of the packed objects, nullifying its
original purpose of being a faster git-fsck.
And of course the final problem is that the bug only shows
up when there _is_ corruption, which is rare. So anybody
running "git fsck --connectivity-only" proactively would
assume it was being thorough, when it was not.
One possibility for fixing this is to find all of the spots
that rely on HAS_OBJ and tweak them for the connectivity-only
case. But besides the risk that we might miss a spot (and I
found three already, corresponding to the three bugs above),
there are other parts of fsck that _can't_ work without a
full list of objects. E.g., the list of dangling objects.
Instead, let's make the connectivity-only case look more
like the normal case. Rather than skip the initial pass
completely, we'll do an abbreviated one that sets up the
HAS_OBJ flag for each object, without actually loading the
object data.
That's simple and fast, and we don't have to care about the
connectivity_only flag in the rest of the code at all.
While we're at it, let's make sure we treat loose and packed
objects the same (i.e., setting up dummy objects for both
and skipping the actual sha1 check). That makes the
connectivity-only check actually fast on a real repo (40
seconds versus 180 seconds on my copy of linux.git).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-17 21:32:57 +00:00
|
|
|
blob=$(echo unrelated | git hash-object -w --stdin) &&
|
2017-01-24 13:27:49 +00:00
|
|
|
mv -f $(sha1_file $blob) $empty &&
|
fsck: prepare dummy objects for --connectivity-check
Normally fsck makes a pass over all objects to check their
integrity, and then follows up with a reachability check to
make sure we have all of the referenced objects (and to know
which ones are dangling). The latter checks for the HAS_OBJ
flag in obj->flags to see if we found the object in the
first pass.
Commit 02976bf85 (fsck: introduce `git fsck --connectivity-only`,
2015-06-22) taught fsck to skip the initial pass, and to
fallback to has_sha1_file() instead of the HAS_OBJ check.
However, it converted only one HAS_OBJ check to use
has_sha1_file(). But there are many other places in
builtin/fsck.c that assume that the flag is set (or that
lookup_object() will return an object at all). This leads to
several bugs with --connectivity-only:
1. mark_object() will not queue objects for examination,
so recursively following links from commits to trees,
etc, did nothing. I.e., we were checking the
reachability of hardly anything at all.
2. When a set of heads is given on the command-line, we
use lookup_object() to see if they exist. But without
the initial pass, we assume nothing exists.
3. When loading reflog entries, we do a similar
lookup_object() check, and complain that the reflog is
broken if the object doesn't exist in our hash.
So in short, --connectivity-only is broken pretty badly, and
will claim that your repository is fine when it's not.
Presumably nobody noticed for a few reasons.
One is that the embedded test does not actually test the
recursive nature of the reachability check. All of the
missing objects are still in the index, and we directly
check items from the index. This patch modifies the test to
delete the index, which shows off breakage (1).
Another is that --connectivity-only just skips the initial
pass for loose objects. So on a real repository, the packed
objects were still checked correctly. But on the flipside,
it means that "git fsck --connectivity-only" still checks
the sha1 of all of the packed objects, nullifying its
original purpose of being a faster git-fsck.
And of course the final problem is that the bug only shows
up when there _is_ corruption, which is rare. So anybody
running "git fsck --connectivity-only" proactively would
assume it was being thorough, when it was not.
One possibility for fixing this is to find all of the spots
that rely on HAS_OBJ and tweak them for the connectivity-only
case. But besides the risk that we might miss a spot (and I
found three already, corresponding to the three bugs above),
there are other parts of fsck that _can't_ work without a
full list of objects. E.g., the list of dangling objects.
Instead, let's make the connectivity-only case look more
like the normal case. Rather than skip the initial pass
completely, we'll do an abbreviated one that sets up the
HAS_OBJ flag for each object, without actually loading the
object data.
That's simple and fast, and we don't have to care about the
connectivity_only flag in the rest of the code at all.
While we're at it, let's make sure we treat loose and packed
objects the same (i.e., setting up dummy objects for both
and skipping the actual sha1 check). That makes the
connectivity-only check actually fast on a real repo (40
seconds versus 180 seconds on my copy of linux.git).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-17 21:32:57 +00:00
|
|
|
|
2015-06-22 15:27:12 +00:00
|
|
|
test_must_fail git fsck --strict &&
|
|
|
|
git fsck --strict --connectivity-only &&
|
|
|
|
tree=$(git rev-parse HEAD:) &&
|
|
|
|
suffix=${tree#??} &&
|
|
|
|
tree=.git/objects/${tree%$suffix}/$suffix &&
|
|
|
|
rm -f $tree &&
|
|
|
|
echo invalid >$tree &&
|
|
|
|
test_must_fail git fsck --strict --connectivity-only
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
fsck: prepare dummy objects for --connectivity-check
Normally fsck makes a pass over all objects to check their
integrity, and then follows up with a reachability check to
make sure we have all of the referenced objects (and to know
which ones are dangling). The latter checks for the HAS_OBJ
flag in obj->flags to see if we found the object in the
first pass.
Commit 02976bf85 (fsck: introduce `git fsck --connectivity-only`,
2015-06-22) taught fsck to skip the initial pass, and to
fallback to has_sha1_file() instead of the HAS_OBJ check.
However, it converted only one HAS_OBJ check to use
has_sha1_file(). But there are many other places in
builtin/fsck.c that assume that the flag is set (or that
lookup_object() will return an object at all). This leads to
several bugs with --connectivity-only:
1. mark_object() will not queue objects for examination,
so recursively following links from commits to trees,
etc, did nothing. I.e., we were checking the
reachability of hardly anything at all.
2. When a set of heads is given on the command-line, we
use lookup_object() to see if they exist. But without
the initial pass, we assume nothing exists.
3. When loading reflog entries, we do a similar
lookup_object() check, and complain that the reflog is
broken if the object doesn't exist in our hash.
So in short, --connectivity-only is broken pretty badly, and
will claim that your repository is fine when it's not.
Presumably nobody noticed for a few reasons.
One is that the embedded test does not actually test the
recursive nature of the reachability check. All of the
missing objects are still in the index, and we directly
check items from the index. This patch modifies the test to
delete the index, which shows off breakage (1).
Another is that --connectivity-only just skips the initial
pass for loose objects. So on a real repository, the packed
objects were still checked correctly. But on the flipside,
it means that "git fsck --connectivity-only" still checks
the sha1 of all of the packed objects, nullifying its
original purpose of being a faster git-fsck.
And of course the final problem is that the bug only shows
up when there _is_ corruption, which is rare. So anybody
running "git fsck --connectivity-only" proactively would
assume it was being thorough, when it was not.
One possibility for fixing this is to find all of the spots
that rely on HAS_OBJ and tweak them for the connectivity-only
case. But besides the risk that we might miss a spot (and I
found three already, corresponding to the three bugs above),
there are other parts of fsck that _can't_ work without a
full list of objects. E.g., the list of dangling objects.
Instead, let's make the connectivity-only case look more
like the normal case. Rather than skip the initial pass
completely, we'll do an abbreviated one that sets up the
HAS_OBJ flag for each object, without actually loading the
object data.
That's simple and fast, and we don't have to care about the
connectivity_only flag in the rest of the code at all.
While we're at it, let's make sure we treat loose and packed
objects the same (i.e., setting up dummy objects for both
and skipping the actual sha1 check). That makes the
connectivity-only check actually fast on a real repo (40
seconds versus 180 seconds on my copy of linux.git).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-17 21:32:57 +00:00
|
|
|
test_expect_success 'fsck --connectivity-only with explicit head' '
|
|
|
|
rm -rf connectivity-only &&
|
|
|
|
git init connectivity-only &&
|
|
|
|
(
|
|
|
|
cd connectivity-only &&
|
|
|
|
test_commit foo &&
|
|
|
|
rm -f .git/index &&
|
|
|
|
tree=$(git rev-parse HEAD^{tree}) &&
|
|
|
|
remove_object $(git rev-parse HEAD:foo.t) &&
|
|
|
|
test_must_fail git fsck --connectivity-only $tree
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2016-07-17 11:00:02 +00:00
|
|
|
test_expect_success 'fsck --name-objects' '
|
|
|
|
rm -rf name-objects &&
|
|
|
|
git init name-objects &&
|
|
|
|
(
|
|
|
|
cd name-objects &&
|
2021-02-10 18:01:30 +00:00
|
|
|
git config core.logAllRefUpdates false &&
|
2016-07-17 11:00:02 +00:00
|
|
|
test_commit julius caesar.t &&
|
2021-02-10 18:01:30 +00:00
|
|
|
test_commit augustus44 &&
|
|
|
|
test_commit caesar &&
|
t1450: refactor loose-object removal
Commit 90cf590f5 (fsck: optionally show more helpful info
for broken links, 2016-07-17) added a remove_loose_object()
helper, but we already had a remove_object() helper that did
the same thing. Let's combine these into one.
The implementations had a few subtle differences, so I've
tried to take the best of both:
- the original used "sed", but the newer version avoids
spawning an extra process
- the original processed "$*", which was nonsense, as it
assumed only a single sha1. Use "$1" to make that more
clear.
- the newer version ran an extra rev-parse, but it was not
necessary; it's sole caller already converted the
argument into a raw sha1
- the original used "rm -f", whereas the new one uses
"rm". The latter is better because it may notice a bug
or other unexpected failure in the test. (The original
does check that the object exists before we remove it,
which is good, but that's a subset of the possible
unexpected conditions).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-13 17:54:10 +00:00
|
|
|
remove_object $(git rev-parse julius:caesar.t) &&
|
2016-07-17 11:00:02 +00:00
|
|
|
tree=$(git rev-parse --verify julius:) &&
|
2021-02-10 18:01:30 +00:00
|
|
|
git tag -d julius &&
|
|
|
|
test_must_fail git fsck --name-objects >out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "$tree (refs/tags/augustus44\\^:" out
|
2016-07-17 11:00:02 +00:00
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2017-01-13 17:54:39 +00:00
|
|
|
test_expect_success 'alternate objects are correctly blamed' '
|
|
|
|
test_when_finished "rm -rf alt.git .git/objects/info/alternates" &&
|
2019-06-28 22:59:21 +00:00
|
|
|
name=$(test_oid numeric) &&
|
|
|
|
path=$(test_oid_to_path "$name") &&
|
2017-01-13 17:54:39 +00:00
|
|
|
git init --bare alt.git &&
|
|
|
|
echo "../../alt.git/objects" >.git/objects/info/alternates &&
|
2019-06-28 22:59:21 +00:00
|
|
|
mkdir alt.git/objects/$(dirname $path) &&
|
|
|
|
>alt.git/objects/$(dirname $path)/$(basename $path) &&
|
2017-01-13 17:54:39 +00:00
|
|
|
test_must_fail git fsck >out 2>&1 &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep alt.git out
|
2017-01-13 17:54:39 +00:00
|
|
|
'
|
|
|
|
|
2017-01-13 17:55:55 +00:00
|
|
|
test_expect_success 'fsck errors in packed objects' '
|
|
|
|
git cat-file commit HEAD >basis &&
|
|
|
|
sed "s/</one/" basis >one &&
|
|
|
|
sed "s/</foo/" basis >two &&
|
2023-01-18 20:41:56 +00:00
|
|
|
one=$(git hash-object --literally -t commit -w one) &&
|
|
|
|
two=$(git hash-object --literally -t commit -w two) &&
|
2017-01-13 17:55:55 +00:00
|
|
|
pack=$(
|
|
|
|
{
|
|
|
|
echo $one &&
|
|
|
|
echo $two
|
|
|
|
} | git pack-objects .git/objects/pack/pack
|
|
|
|
) &&
|
|
|
|
test_when_finished "rm -f .git/objects/pack/pack-$pack.*" &&
|
|
|
|
remove_object $one &&
|
|
|
|
remove_object $two &&
|
|
|
|
test_must_fail git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "error in commit $one.* - bad name" out &&
|
|
|
|
test_grep "error in commit $two.* - bad name" out &&
|
2017-01-13 17:55:55 +00:00
|
|
|
! grep corrupt out
|
|
|
|
'
|
|
|
|
|
2017-07-28 20:08:02 +00:00
|
|
|
test_expect_success 'fsck fails on corrupt packfile' '
|
|
|
|
hsh=$(git commit-tree -m mycommit HEAD^{tree}) &&
|
|
|
|
pack=$(echo $hsh | git pack-objects .git/objects/pack/pack) &&
|
|
|
|
|
|
|
|
# Corrupt the first byte of the first object. (It contains 3 type bits,
|
|
|
|
# at least one of which is not zero, so setting the first byte to 0 is
|
|
|
|
# sufficient.)
|
|
|
|
chmod a+w .git/objects/pack/pack-$pack.pack &&
|
2020-08-01 22:06:11 +00:00
|
|
|
printf "\0" | dd of=.git/objects/pack/pack-$pack.pack bs=1 conv=notrunc seek=12 &&
|
2017-07-28 20:08:02 +00:00
|
|
|
|
|
|
|
test_when_finished "rm -f .git/objects/pack/pack-$pack.*" &&
|
|
|
|
remove_object $hsh &&
|
|
|
|
test_must_fail git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "checksum mismatch" out
|
2017-07-28 20:08:02 +00:00
|
|
|
'
|
|
|
|
|
fsck: parse loose object paths directly
When we iterate over the list of loose objects to check, we
get the actual path of each object. But we then throw it
away and pass just the sha1 to fsck_sha1(), which will do a
fresh lookup. Usually it would find the same object, but it
may not if an object exists both as a loose and a packed
object. We may end up checking the packed object twice, and
never look at the loose one.
In practice this isn't too terrible, because if fsck doesn't
complain, it means you have at least one good copy. But
since the point of fsck is to look for corruption, we should
be thorough.
The new read_loose_object() interface can help us get the
data from disk, and then we replace parse_object() with
parse_object_buffer(). As a bonus, our error messages now
mention the path to a corrupted object, which should make it
easier to track down errors when they do happen.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-13 17:59:44 +00:00
|
|
|
test_expect_success 'fsck finds problems in duplicate loose objects' '
|
|
|
|
rm -rf broken-duplicate &&
|
|
|
|
git init broken-duplicate &&
|
|
|
|
(
|
|
|
|
cd broken-duplicate &&
|
|
|
|
test_commit duplicate &&
|
|
|
|
# no "-d" here, so we end up with duplicates
|
|
|
|
git repack &&
|
|
|
|
# now corrupt the loose copy
|
2022-05-12 22:32:18 +00:00
|
|
|
oid="$(git rev-parse HEAD)" &&
|
|
|
|
file=$(sha1_file "$oid") &&
|
fsck: parse loose object paths directly
When we iterate over the list of loose objects to check, we
get the actual path of each object. But we then throw it
away and pass just the sha1 to fsck_sha1(), which will do a
fresh lookup. Usually it would find the same object, but it
may not if an object exists both as a loose and a packed
object. We may end up checking the packed object twice, and
never look at the loose one.
In practice this isn't too terrible, because if fsck doesn't
complain, it means you have at least one good copy. But
since the point of fsck is to look for corruption, we should
be thorough.
The new read_loose_object() interface can help us get the
data from disk, and then we replace parse_object() with
parse_object_buffer(). As a bonus, our error messages now
mention the path to a corrupted object, which should make it
easier to track down errors when they do happen.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-13 17:59:44 +00:00
|
|
|
rm "$file" &&
|
|
|
|
echo broken >"$file" &&
|
2022-05-12 22:32:18 +00:00
|
|
|
test_must_fail git fsck 2>err &&
|
|
|
|
|
|
|
|
cat >expect <<-EOF &&
|
|
|
|
error: inflate: data stream error (incorrect header check)
|
|
|
|
error: unable to unpack header of $file
|
|
|
|
error: $oid: object corrupt or missing: $file
|
|
|
|
EOF
|
|
|
|
grep "^error: " err >actual &&
|
|
|
|
test_cmp expect actual
|
fsck: parse loose object paths directly
When we iterate over the list of loose objects to check, we
get the actual path of each object. But we then throw it
away and pass just the sha1 to fsck_sha1(), which will do a
fresh lookup. Usually it would find the same object, but it
may not if an object exists both as a loose and a packed
object. We may end up checking the packed object twice, and
never look at the loose one.
In practice this isn't too terrible, because if fsck doesn't
complain, it means you have at least one good copy. But
since the point of fsck is to look for corruption, we should
be thorough.
The new read_loose_object() interface can help us get the
data from disk, and then we replace parse_object() with
parse_object_buffer(). As a bonus, our error messages now
mention the path to a corrupted object, which should make it
easier to track down errors when they do happen.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-13 17:59:44 +00:00
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2017-01-13 18:00:25 +00:00
|
|
|
test_expect_success 'fsck detects trailing loose garbage (commit)' '
|
|
|
|
git cat-file commit HEAD >basis &&
|
|
|
|
echo bump-commit-sha1 >>basis &&
|
|
|
|
commit=$(git hash-object -w -t commit basis) &&
|
|
|
|
file=$(sha1_file $commit) &&
|
|
|
|
test_when_finished "remove_object $commit" &&
|
|
|
|
chmod +w "$file" &&
|
|
|
|
echo garbage >>"$file" &&
|
|
|
|
test_must_fail git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "garbage.*$commit" out
|
2017-01-13 18:00:25 +00:00
|
|
|
'
|
|
|
|
|
2018-10-30 23:18:51 +00:00
|
|
|
test_expect_success 'fsck detects trailing loose garbage (large blob)' '
|
2017-01-13 18:00:25 +00:00
|
|
|
blob=$(echo trailing | git hash-object -w --stdin) &&
|
|
|
|
file=$(sha1_file $blob) &&
|
|
|
|
test_when_finished "remove_object $blob" &&
|
|
|
|
chmod +w "$file" &&
|
|
|
|
echo garbage >>"$file" &&
|
2018-10-30 23:18:51 +00:00
|
|
|
test_must_fail git -c core.bigfilethreshold=5 fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "garbage.*$blob" out
|
2017-01-13 18:00:25 +00:00
|
|
|
'
|
|
|
|
|
check_stream_sha1(): handle input underflow
This commit fixes an infinite loop when fscking large
truncated loose objects.
The check_stream_sha1() function takes an mmap'd loose
object buffer and streams 4k of output at a time, checking
its sha1. The loop quits when we've output enough bytes (we
know the size from the object header), or when zlib tells us
anything except Z_OK or Z_BUF_ERROR.
The latter is expected because zlib may run out of room in
our 4k buffer, and that is how it tells us to process the
output and loop again.
But Z_BUF_ERROR also covers another case: one in which zlib
cannot make forward progress because it needs more _input_.
This should never happen in this loop, because though we're
streaming the output, we have the entire deflated input
available in the mmap'd buffer. But since we don't check
this case, we'll just loop infinitely if we do see a
truncated object, thinking that zlib is asking for more
output space.
It's tempting to fix this by checking stream->avail_in as
part of the loop condition (and quitting if all of our bytes
have been consumed). But that assumes that once zlib has
consumed the input, there is nothing left to do. That's not
necessarily the case: it may have read our input into its
internal state, but still have bytes to output.
Instead, let's continue on Z_BUF_ERROR only when we see the
case we're expecting: the previous round filled our output
buffer completely. If it didn't (and we still saw
Z_BUF_ERROR), we know something is wrong and should break
out of the loop.
The bug comes from commit f6371f9210 (sha1_file: add
read_loose_object() function, 2017-01-13), which
reimplemented some of the existing loose object functions.
So it's worth checking if this bug was inherited from any of
those. The answers seems to be no. The two obvious
candidates are both OK:
1. unpack_sha1_rest(); this doesn't need to loop on
Z_BUF_ERROR at all, since it allocates the expected
output buffer in advance (which we can't do since we're
explicitly streaming here)
2. check_object_signature(); the streaming path relies on
the istream interface, which uses read_istream_loose()
for this case. That function uses a similar "is our
output buffer full" check with Z_BUF_ERROR (which is
where I stole it from for this patch!)
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-30 23:23:12 +00:00
|
|
|
test_expect_success 'fsck detects truncated loose object' '
|
|
|
|
# make it big enough that we know we will truncate in the data
|
|
|
|
# portion, not the header
|
2018-10-31 04:12:12 +00:00
|
|
|
test-tool genrandom truncate 4096 >file &&
|
check_stream_sha1(): handle input underflow
This commit fixes an infinite loop when fscking large
truncated loose objects.
The check_stream_sha1() function takes an mmap'd loose
object buffer and streams 4k of output at a time, checking
its sha1. The loop quits when we've output enough bytes (we
know the size from the object header), or when zlib tells us
anything except Z_OK or Z_BUF_ERROR.
The latter is expected because zlib may run out of room in
our 4k buffer, and that is how it tells us to process the
output and loop again.
But Z_BUF_ERROR also covers another case: one in which zlib
cannot make forward progress because it needs more _input_.
This should never happen in this loop, because though we're
streaming the output, we have the entire deflated input
available in the mmap'd buffer. But since we don't check
this case, we'll just loop infinitely if we do see a
truncated object, thinking that zlib is asking for more
output space.
It's tempting to fix this by checking stream->avail_in as
part of the loop condition (and quitting if all of our bytes
have been consumed). But that assumes that once zlib has
consumed the input, there is nothing left to do. That's not
necessarily the case: it may have read our input into its
internal state, but still have bytes to output.
Instead, let's continue on Z_BUF_ERROR only when we see the
case we're expecting: the previous round filled our output
buffer completely. If it didn't (and we still saw
Z_BUF_ERROR), we know something is wrong and should break
out of the loop.
The bug comes from commit f6371f9210 (sha1_file: add
read_loose_object() function, 2017-01-13), which
reimplemented some of the existing loose object functions.
So it's worth checking if this bug was inherited from any of
those. The answers seems to be no. The two obvious
candidates are both OK:
1. unpack_sha1_rest(); this doesn't need to loop on
Z_BUF_ERROR at all, since it allocates the expected
output buffer in advance (which we can't do since we're
explicitly streaming here)
2. check_object_signature(); the streaming path relies on
the istream interface, which uses read_istream_loose()
for this case. That function uses a similar "is our
output buffer full" check with Z_BUF_ERROR (which is
where I stole it from for this patch!)
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-30 23:23:12 +00:00
|
|
|
blob=$(git hash-object -w file) &&
|
|
|
|
file=$(sha1_file $blob) &&
|
|
|
|
test_when_finished "remove_object $blob" &&
|
|
|
|
test_copy_bytes 1024 <"$file" >tmp &&
|
|
|
|
rm "$file" &&
|
|
|
|
mv -f tmp "$file" &&
|
|
|
|
|
|
|
|
# check both regular and streaming code paths
|
|
|
|
test_must_fail git fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep corrupt.*$blob out &&
|
check_stream_sha1(): handle input underflow
This commit fixes an infinite loop when fscking large
truncated loose objects.
The check_stream_sha1() function takes an mmap'd loose
object buffer and streams 4k of output at a time, checking
its sha1. The loop quits when we've output enough bytes (we
know the size from the object header), or when zlib tells us
anything except Z_OK or Z_BUF_ERROR.
The latter is expected because zlib may run out of room in
our 4k buffer, and that is how it tells us to process the
output and loop again.
But Z_BUF_ERROR also covers another case: one in which zlib
cannot make forward progress because it needs more _input_.
This should never happen in this loop, because though we're
streaming the output, we have the entire deflated input
available in the mmap'd buffer. But since we don't check
this case, we'll just loop infinitely if we do see a
truncated object, thinking that zlib is asking for more
output space.
It's tempting to fix this by checking stream->avail_in as
part of the loop condition (and quitting if all of our bytes
have been consumed). But that assumes that once zlib has
consumed the input, there is nothing left to do. That's not
necessarily the case: it may have read our input into its
internal state, but still have bytes to output.
Instead, let's continue on Z_BUF_ERROR only when we see the
case we're expecting: the previous round filled our output
buffer completely. If it didn't (and we still saw
Z_BUF_ERROR), we know something is wrong and should break
out of the loop.
The bug comes from commit f6371f9210 (sha1_file: add
read_loose_object() function, 2017-01-13), which
reimplemented some of the existing loose object functions.
So it's worth checking if this bug was inherited from any of
those. The answers seems to be no. The two obvious
candidates are both OK:
1. unpack_sha1_rest(); this doesn't need to loop on
Z_BUF_ERROR at all, since it allocates the expected
output buffer in advance (which we can't do since we're
explicitly streaming here)
2. check_object_signature(); the streaming path relies on
the istream interface, which uses read_istream_loose()
for this case. That function uses a similar "is our
output buffer full" check with Z_BUF_ERROR (which is
where I stole it from for this patch!)
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-30 23:23:12 +00:00
|
|
|
|
|
|
|
test_must_fail git -c core.bigfilethreshold=128 fsck 2>out &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep corrupt.*$blob out
|
check_stream_sha1(): handle input underflow
This commit fixes an infinite loop when fscking large
truncated loose objects.
The check_stream_sha1() function takes an mmap'd loose
object buffer and streams 4k of output at a time, checking
its sha1. The loop quits when we've output enough bytes (we
know the size from the object header), or when zlib tells us
anything except Z_OK or Z_BUF_ERROR.
The latter is expected because zlib may run out of room in
our 4k buffer, and that is how it tells us to process the
output and loop again.
But Z_BUF_ERROR also covers another case: one in which zlib
cannot make forward progress because it needs more _input_.
This should never happen in this loop, because though we're
streaming the output, we have the entire deflated input
available in the mmap'd buffer. But since we don't check
this case, we'll just loop infinitely if we do see a
truncated object, thinking that zlib is asking for more
output space.
It's tempting to fix this by checking stream->avail_in as
part of the loop condition (and quitting if all of our bytes
have been consumed). But that assumes that once zlib has
consumed the input, there is nothing left to do. That's not
necessarily the case: it may have read our input into its
internal state, but still have bytes to output.
Instead, let's continue on Z_BUF_ERROR only when we see the
case we're expecting: the previous round filled our output
buffer completely. If it didn't (and we still saw
Z_BUF_ERROR), we know something is wrong and should break
out of the loop.
The bug comes from commit f6371f9210 (sha1_file: add
read_loose_object() function, 2017-01-13), which
reimplemented some of the existing loose object functions.
So it's worth checking if this bug was inherited from any of
those. The answers seems to be no. The two obvious
candidates are both OK:
1. unpack_sha1_rest(); this doesn't need to loop on
Z_BUF_ERROR at all, since it allocates the expected
output buffer in advance (which we can't do since we're
explicitly streaming here)
2. check_object_signature(); the streaming path relies on
the istream interface, which uses read_istream_loose()
for this case. That function uses a similar "is our
output buffer full" check with Z_BUF_ERROR (which is
where I stole it from for this patch!)
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-30 23:23:12 +00:00
|
|
|
'
|
|
|
|
|
2017-01-16 21:25:35 +00:00
|
|
|
# for each of type, we have one version which is referenced by another object
|
|
|
|
# (and so while unreachable, not dangling), and another variant which really is
|
|
|
|
# dangling.
|
fsck: always compute USED flags for unreachable objects
The --connectivity-only option avoids opening every object, and instead
just marks reachable objects with a flag and compares this to the set
of all objects. This strategy is discussed in more detail in 3e3f8bd608
(fsck: prepare dummy objects for --connectivity-check, 2017-01-17).
This means that we report _every_ unreachable object as dangling.
Whereas in a full fsck, we'd have actually opened and parsed each of
those unreachable objects, marking their child objects with the USED
flag, to mean "this was mentioned by another object". And thus we can
report only the tip of an unreachable segment of the object graph as
dangling.
You can see this difference with a trivial example:
tree=$(git hash-object -t tree -w /dev/null)
one=$(echo one | git commit-tree $tree)
two=$(echo two | git commit-tree -p $one $tree)
Running `git fsck` will report only $two as dangling, but with
--connectivity-only, both commits (and the tree) are reported. Likewise,
using --lost-found would write all three objects.
We can make --connectivity-only work like the normal case by taking a
separate pass over the unreachable objects, parsing them and marking
objects they refer to as USED. That still avoids parsing any blobs,
though we do pay the cost to access any unreachable commits and trees
(which may or may not be noticeable, depending on how many you have).
If neither --dangling nor --lost-found is in effect, then we can skip
this step entirely, just like we do now. That makes "--connectivity-only
--no-dangling" just as fast as the current "--connectivity-only". I.e.,
we do the correct thing always, but you can still tweak the options to
make it faster if you don't care about dangling objects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-05 04:47:39 +00:00
|
|
|
test_expect_success 'create dangling-object repository' '
|
2017-01-16 21:25:35 +00:00
|
|
|
git init dangling &&
|
|
|
|
(
|
|
|
|
cd dangling &&
|
|
|
|
blob=$(echo not-dangling | git hash-object -w --stdin) &&
|
|
|
|
dblob=$(echo dangling | git hash-object -w --stdin) &&
|
|
|
|
tree=$(printf "100644 blob %s\t%s\n" $blob one | git mktree) &&
|
|
|
|
dtree=$(printf "100644 blob %s\t%s\n" $blob two | git mktree) &&
|
|
|
|
commit=$(git commit-tree $tree) &&
|
|
|
|
dcommit=$(git commit-tree -p $commit $tree) &&
|
|
|
|
|
fsck: always compute USED flags for unreachable objects
The --connectivity-only option avoids opening every object, and instead
just marks reachable objects with a flag and compares this to the set
of all objects. This strategy is discussed in more detail in 3e3f8bd608
(fsck: prepare dummy objects for --connectivity-check, 2017-01-17).
This means that we report _every_ unreachable object as dangling.
Whereas in a full fsck, we'd have actually opened and parsed each of
those unreachable objects, marking their child objects with the USED
flag, to mean "this was mentioned by another object". And thus we can
report only the tip of an unreachable segment of the object graph as
dangling.
You can see this difference with a trivial example:
tree=$(git hash-object -t tree -w /dev/null)
one=$(echo one | git commit-tree $tree)
two=$(echo two | git commit-tree -p $one $tree)
Running `git fsck` will report only $two as dangling, but with
--connectivity-only, both commits (and the tree) are reported. Likewise,
using --lost-found would write all three objects.
We can make --connectivity-only work like the normal case by taking a
separate pass over the unreachable objects, parsing them and marking
objects they refer to as USED. That still avoids parsing any blobs,
though we do pay the cost to access any unreachable commits and trees
(which may or may not be noticeable, depending on how many you have).
If neither --dangling nor --lost-found is in effect, then we can skip
this step entirely, just like we do now. That makes "--connectivity-only
--no-dangling" just as fast as the current "--connectivity-only". I.e.,
we do the correct thing always, but you can still tweak the options to
make it faster if you don't care about dangling objects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-05 04:47:39 +00:00
|
|
|
cat >expect <<-EOF
|
2017-01-16 21:25:35 +00:00
|
|
|
dangling blob $dblob
|
|
|
|
dangling commit $dcommit
|
|
|
|
dangling tree $dtree
|
|
|
|
EOF
|
fsck: always compute USED flags for unreachable objects
The --connectivity-only option avoids opening every object, and instead
just marks reachable objects with a flag and compares this to the set
of all objects. This strategy is discussed in more detail in 3e3f8bd608
(fsck: prepare dummy objects for --connectivity-check, 2017-01-17).
This means that we report _every_ unreachable object as dangling.
Whereas in a full fsck, we'd have actually opened and parsed each of
those unreachable objects, marking their child objects with the USED
flag, to mean "this was mentioned by another object". And thus we can
report only the tip of an unreachable segment of the object graph as
dangling.
You can see this difference with a trivial example:
tree=$(git hash-object -t tree -w /dev/null)
one=$(echo one | git commit-tree $tree)
two=$(echo two | git commit-tree -p $one $tree)
Running `git fsck` will report only $two as dangling, but with
--connectivity-only, both commits (and the tree) are reported. Likewise,
using --lost-found would write all three objects.
We can make --connectivity-only work like the normal case by taking a
separate pass over the unreachable objects, parsing them and marking
objects they refer to as USED. That still avoids parsing any blobs,
though we do pay the cost to access any unreachable commits and trees
(which may or may not be noticeable, depending on how many you have).
If neither --dangling nor --lost-found is in effect, then we can skip
this step entirely, just like we do now. That makes "--connectivity-only
--no-dangling" just as fast as the current "--connectivity-only". I.e.,
we do the correct thing always, but you can still tweak the options to
make it faster if you don't care about dangling objects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-05 04:47:39 +00:00
|
|
|
)
|
|
|
|
'
|
2017-01-16 21:25:35 +00:00
|
|
|
|
fsck: always compute USED flags for unreachable objects
The --connectivity-only option avoids opening every object, and instead
just marks reachable objects with a flag and compares this to the set
of all objects. This strategy is discussed in more detail in 3e3f8bd608
(fsck: prepare dummy objects for --connectivity-check, 2017-01-17).
This means that we report _every_ unreachable object as dangling.
Whereas in a full fsck, we'd have actually opened and parsed each of
those unreachable objects, marking their child objects with the USED
flag, to mean "this was mentioned by another object". And thus we can
report only the tip of an unreachable segment of the object graph as
dangling.
You can see this difference with a trivial example:
tree=$(git hash-object -t tree -w /dev/null)
one=$(echo one | git commit-tree $tree)
two=$(echo two | git commit-tree -p $one $tree)
Running `git fsck` will report only $two as dangling, but with
--connectivity-only, both commits (and the tree) are reported. Likewise,
using --lost-found would write all three objects.
We can make --connectivity-only work like the normal case by taking a
separate pass over the unreachable objects, parsing them and marking
objects they refer to as USED. That still avoids parsing any blobs,
though we do pay the cost to access any unreachable commits and trees
(which may or may not be noticeable, depending on how many you have).
If neither --dangling nor --lost-found is in effect, then we can skip
this step entirely, just like we do now. That makes "--connectivity-only
--no-dangling" just as fast as the current "--connectivity-only". I.e.,
we do the correct thing always, but you can still tweak the options to
make it faster if you don't care about dangling objects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-05 04:47:39 +00:00
|
|
|
test_expect_success 'fsck notices dangling objects' '
|
|
|
|
(
|
|
|
|
cd dangling &&
|
2017-01-16 21:25:35 +00:00
|
|
|
git fsck >actual &&
|
|
|
|
# the output order is non-deterministic, as it comes from a hash
|
|
|
|
sort <actual >actual.sorted &&
|
2021-02-11 01:53:53 +00:00
|
|
|
test_cmp expect actual.sorted
|
2017-01-16 21:25:35 +00:00
|
|
|
)
|
|
|
|
'
|
|
|
|
|
fsck: always compute USED flags for unreachable objects
The --connectivity-only option avoids opening every object, and instead
just marks reachable objects with a flag and compares this to the set
of all objects. This strategy is discussed in more detail in 3e3f8bd608
(fsck: prepare dummy objects for --connectivity-check, 2017-01-17).
This means that we report _every_ unreachable object as dangling.
Whereas in a full fsck, we'd have actually opened and parsed each of
those unreachable objects, marking their child objects with the USED
flag, to mean "this was mentioned by another object". And thus we can
report only the tip of an unreachable segment of the object graph as
dangling.
You can see this difference with a trivial example:
tree=$(git hash-object -t tree -w /dev/null)
one=$(echo one | git commit-tree $tree)
two=$(echo two | git commit-tree -p $one $tree)
Running `git fsck` will report only $two as dangling, but with
--connectivity-only, both commits (and the tree) are reported. Likewise,
using --lost-found would write all three objects.
We can make --connectivity-only work like the normal case by taking a
separate pass over the unreachable objects, parsing them and marking
objects they refer to as USED. That still avoids parsing any blobs,
though we do pay the cost to access any unreachable commits and trees
(which may or may not be noticeable, depending on how many you have).
If neither --dangling nor --lost-found is in effect, then we can skip
this step entirely, just like we do now. That makes "--connectivity-only
--no-dangling" just as fast as the current "--connectivity-only". I.e.,
we do the correct thing always, but you can still tweak the options to
make it faster if you don't care about dangling objects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-05 04:47:39 +00:00
|
|
|
test_expect_success 'fsck --connectivity-only notices dangling objects' '
|
|
|
|
(
|
|
|
|
cd dangling &&
|
|
|
|
git fsck --connectivity-only >actual &&
|
|
|
|
# the output order is non-deterministic, as it comes from a hash
|
|
|
|
sort <actual >actual.sorted &&
|
2021-02-11 01:53:53 +00:00
|
|
|
test_cmp expect actual.sorted
|
fsck: always compute USED flags for unreachable objects
The --connectivity-only option avoids opening every object, and instead
just marks reachable objects with a flag and compares this to the set
of all objects. This strategy is discussed in more detail in 3e3f8bd608
(fsck: prepare dummy objects for --connectivity-check, 2017-01-17).
This means that we report _every_ unreachable object as dangling.
Whereas in a full fsck, we'd have actually opened and parsed each of
those unreachable objects, marking their child objects with the USED
flag, to mean "this was mentioned by another object". And thus we can
report only the tip of an unreachable segment of the object graph as
dangling.
You can see this difference with a trivial example:
tree=$(git hash-object -t tree -w /dev/null)
one=$(echo one | git commit-tree $tree)
two=$(echo two | git commit-tree -p $one $tree)
Running `git fsck` will report only $two as dangling, but with
--connectivity-only, both commits (and the tree) are reported. Likewise,
using --lost-found would write all three objects.
We can make --connectivity-only work like the normal case by taking a
separate pass over the unreachable objects, parsing them and marking
objects they refer to as USED. That still avoids parsing any blobs,
though we do pay the cost to access any unreachable commits and trees
(which may or may not be noticeable, depending on how many you have).
If neither --dangling nor --lost-found is in effect, then we can skip
this step entirely, just like we do now. That makes "--connectivity-only
--no-dangling" just as fast as the current "--connectivity-only". I.e.,
we do the correct thing always, but you can still tweak the options to
make it faster if you don't care about dangling objects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-05 04:47:39 +00:00
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2017-01-16 21:33:29 +00:00
|
|
|
test_expect_success 'fsck $name notices bogus $name' '
|
|
|
|
test_must_fail git fsck bogus &&
|
2018-05-13 02:24:13 +00:00
|
|
|
test_must_fail git fsck $ZERO_OID
|
2017-01-16 21:33:29 +00:00
|
|
|
'
|
|
|
|
|
2017-01-16 21:34:21 +00:00
|
|
|
test_expect_success 'bogus head does not fallback to all heads' '
|
|
|
|
# set up a case that will cause a reachability complaint
|
|
|
|
echo to-be-deleted >foo &&
|
|
|
|
git add foo &&
|
|
|
|
blob=$(git rev-parse :foo) &&
|
|
|
|
test_when_finished "git rm --cached foo" &&
|
|
|
|
remove_object $blob &&
|
2018-05-13 02:24:13 +00:00
|
|
|
test_must_fail git fsck $ZERO_OID >out 2>&1 &&
|
2017-01-16 21:34:21 +00:00
|
|
|
! grep $blob out
|
|
|
|
'
|
|
|
|
|
2017-04-25 18:41:09 +00:00
|
|
|
# Corrupt the checksum on the index.
|
|
|
|
# Add 1 to the last byte in the SHA.
|
|
|
|
corrupt_index_checksum () {
|
|
|
|
perl -w -e '
|
|
|
|
use Fcntl ":seek";
|
|
|
|
open my $fh, "+<", ".git/index" or die "open: $!";
|
|
|
|
binmode $fh;
|
|
|
|
seek $fh, -1, SEEK_END or die "seek: $!";
|
|
|
|
read $fh, my $in_byte, 1 or die "read: $!";
|
|
|
|
|
|
|
|
$in_value = unpack("C", $in_byte);
|
|
|
|
$out_value = ($in_value + 1) & 255;
|
|
|
|
|
|
|
|
$out_byte = pack("C", $out_value);
|
|
|
|
|
|
|
|
seek $fh, -1, SEEK_END or die "seek: $!";
|
|
|
|
print $fh $out_byte;
|
|
|
|
close $fh or die "close: $!";
|
|
|
|
'
|
|
|
|
}
|
|
|
|
|
|
|
|
# Corrupt the checksum on the index and then
|
|
|
|
# verify that only fsck notices.
|
2017-04-14 20:32:21 +00:00
|
|
|
test_expect_success 'detect corrupt index file in fsck' '
|
|
|
|
cp .git/index .git/index.backup &&
|
|
|
|
test_when_finished "mv .git/index.backup .git/index" &&
|
2017-04-25 18:41:09 +00:00
|
|
|
corrupt_index_checksum &&
|
|
|
|
test_must_fail git fsck --cache 2>errors &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "bad index file" errors
|
2017-04-14 20:32:21 +00:00
|
|
|
'
|
|
|
|
|
2021-10-01 09:16:52 +00:00
|
|
|
test_expect_success 'fsck error and recovery on invalid object type' '
|
2021-10-01 09:16:37 +00:00
|
|
|
git init --bare garbage-type &&
|
|
|
|
(
|
|
|
|
cd garbage-type &&
|
|
|
|
|
2021-10-01 09:16:52 +00:00
|
|
|
garbage_blob=$(git hash-object --stdin -w -t garbage --literally </dev/null) &&
|
2021-10-01 09:16:37 +00:00
|
|
|
|
2023-04-17 19:10:42 +00:00
|
|
|
test_must_fail git fsck 2>err &&
|
2021-10-01 09:16:52 +00:00
|
|
|
grep -e "^error" -e "^fatal" err >errors &&
|
|
|
|
test_line_count = 1 errors &&
|
|
|
|
grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err
|
2021-10-01 09:16:37 +00:00
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2022-12-01 14:46:09 +00:00
|
|
|
test_expect_success 'fsck error on gitattributes with excessive line lengths' '
|
|
|
|
blob=$(printf "pattern %02048d" 1 | git hash-object -w --stdin) &&
|
|
|
|
test_when_finished "remove_object $blob" &&
|
|
|
|
tree=$(printf "100644 blob %s\t%s\n" $blob .gitattributes | git mktree) &&
|
|
|
|
test_when_finished "remove_object $tree" &&
|
|
|
|
cat >expected <<-EOF &&
|
|
|
|
error in blob $blob: gitattributesLineLength: .gitattributes has too long lines to parse
|
|
|
|
EOF
|
|
|
|
test_must_fail git fsck --no-dangling >actual 2>&1 &&
|
|
|
|
test_cmp expected actual
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'fsck error on gitattributes with excessive size' '
|
|
|
|
blob=$(test-tool genzeros $((100 * 1024 * 1024 + 1)) | git hash-object -w --stdin) &&
|
|
|
|
test_when_finished "remove_object $blob" &&
|
|
|
|
tree=$(printf "100644 blob %s\t%s\n" $blob .gitattributes | git mktree) &&
|
|
|
|
test_when_finished "remove_object $tree" &&
|
|
|
|
cat >expected <<-EOF &&
|
|
|
|
error in blob $blob: gitattributesLarge: .gitattributes too large to parse
|
|
|
|
EOF
|
|
|
|
test_must_fail git fsck --no-dangling >actual 2>&1 &&
|
|
|
|
test_cmp expected actual
|
|
|
|
'
|
|
|
|
|
2023-02-24 08:09:57 +00:00
|
|
|
test_expect_success 'fsck detects problems in worktree index' '
|
|
|
|
test_when_finished "git worktree remove -f wt" &&
|
|
|
|
git worktree add wt &&
|
|
|
|
|
|
|
|
echo "this will be removed to break the worktree index" >wt/file &&
|
|
|
|
git -C wt add file &&
|
|
|
|
blob=$(git -C wt rev-parse :file) &&
|
|
|
|
remove_object $blob &&
|
|
|
|
|
2023-02-24 08:12:11 +00:00
|
|
|
test_must_fail git fsck --name-objects >actual 2>&1 &&
|
|
|
|
cat >expect <<-EOF &&
|
|
|
|
missing blob $blob (.git/worktrees/wt/index:file)
|
|
|
|
EOF
|
|
|
|
test_cmp expect actual
|
|
|
|
'
|
|
|
|
|
2023-06-29 18:13:33 +00:00
|
|
|
test_expect_success 'fsck reports problems in current worktree index without filename' '
|
2023-02-24 08:12:11 +00:00
|
|
|
test_when_finished "rm -f .git/index && git read-tree HEAD" &&
|
2023-06-29 18:13:33 +00:00
|
|
|
echo "this object will be removed to break current worktree index" >file &&
|
2023-02-24 08:12:11 +00:00
|
|
|
git add file &&
|
|
|
|
blob=$(git rev-parse :file) &&
|
|
|
|
remove_object $blob &&
|
|
|
|
|
|
|
|
test_must_fail git fsck --name-objects >actual 2>&1 &&
|
|
|
|
cat >expect <<-EOF &&
|
|
|
|
missing blob $blob (:file)
|
|
|
|
EOF
|
|
|
|
test_cmp expect actual
|
2023-02-24 08:09:57 +00:00
|
|
|
'
|
|
|
|
|
2009-01-30 08:33:00 +00:00
|
|
|
test_done
|