git/t/t6022-rev-list-missing.sh

150 lines
4.1 KiB
Bash
Raw Normal View History

#!/bin/sh
test_description='handling of missing objects in rev-list'
TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh
# We setup the repository with two commits, this way HEAD is always
# available and we can hide commit 1.
test_expect_success 'create repository and alternate directory' '
test_commit 1 &&
test_commit 2 &&
test_commit 3 &&
git tag -m "tag message" annot_tag HEAD~1 &&
git tag regul_tag HEAD~1 &&
git branch a_branch HEAD~1
'
commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default In 7a5d604443 (commit: detect commits that exist in commit-graph but not in the ODB, 2023-10-31), we have introduced a new object existence check into `repo_parse_commit_internal()` so that we do not parse commits via the commit-graph that don't have a corresponding object in the object database. This new check of course comes with a performance penalty, which the commit put at around 30% for `git rev-list --topo-order`. But there are in fact scenarios where the performance regression is even higher. The following benchmark against linux.git with a fully-build commit-graph: Benchmark 1: git.v2.42.1 rev-list --count HEAD Time (mean ± σ): 658.0 ms ± 5.2 ms [User: 613.5 ms, System: 44.4 ms] Range (min … max): 650.2 ms … 666.0 ms 10 runs Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD Time (mean ± σ): 1.333 s ± 0.019 s [User: 1.263 s, System: 0.069 s] Range (min … max): 1.302 s … 1.361 s 10 runs Summary git.v2.42.1 rev-list --count HEAD ran 2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD While it's a noble goal to ensure that results are the same regardless of whether or not we have a potentially stale commit-graph, taking twice as much time is a tough sell. Furthermore, we can generally assume that the commit-graph will be updated by git-gc(1) or git-maintenance(1) as required so that the case where the commit-graph is stale should not at all be common. With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore the behaviour and thus performance previous to the mentioned commit. In order to not be inconsistent, also disable this behaviour by default in `lookup_commit_in_graph()`, where the object existence check has been introduced right at its inception via f559d6d45e (revision: avoid hitting packfiles when commits are in commit-graph, 2021-08-09). This results in another speedup in commands that end up calling this function, even though it's less pronounced compared to the above benchmark. The following has been executed in linux.git with ~1.2 million references: Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.947 s ± 0.003 s [User: 2.412 s, System: 0.534 s] Range (min … max): 2.943 s … 2.949 s 3 runs Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.724 s ± 0.030 s [User: 2.207 s, System: 0.514 s] Range (min … max): 2.704 s … 2.759 s 3 runs Summary GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran 1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted So whereas 7a5d604443 initially introduced the logic to start doing an object existence check in `repo_parse_commit_internal()` by default, the updated logic will now instead cause `lookup_commit_in_graph()` to stop doing the check by default. This behaviour continues to be tweakable by the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable. Note that this requires us to amend some tests to manually turn on the paranoid checks again. This is because we cause repository corruption by manually deleting objects which are part of the commit graph already. These circumstances shouldn't usually happen in repositories. Reported-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-24 11:08:21 +00:00
# We manually corrupt the repository, which means that the commit-graph may
# contain references to already-deleted objects. We thus need to enable
# commit-graph paranoia to not returned these deleted commits from the graph.
GIT_COMMIT_GRAPH_PARANOIA=true
export GIT_COMMIT_GRAPH_PARANOIA
for obj in "HEAD~1" "HEAD~1^{tree}" "HEAD:1.t"
do
test_expect_success "rev-list --missing=error fails with missing object $obj" '
oid="$(git rev-parse $obj)" &&
path=".git/objects/$(test_oid_to_path $oid)" &&
mv "$path" "$path.hidden" &&
test_when_finished "mv $path.hidden $path" &&
test_must_fail git rev-list --missing=error --objects \
--no-object-names HEAD
'
done
for obj in "HEAD~1" "HEAD~1^{tree}" "HEAD:1.t"
do
for action in "allow-any" "print"
do
test_expect_success "rev-list --missing=$action with missing $obj" '
oid="$(git rev-parse $obj)" &&
path=".git/objects/$(test_oid_to_path $oid)" &&
# Before the object is made missing, we use rev-list to
# get the expected oids.
git rev-list --objects --no-object-names \
HEAD ^$obj >expect.raw &&
# Blobs are shared by all commits, so even though a commit/tree
# might be skipped, its blob must be accounted for.
if test $obj != "HEAD:1.t"
then
echo $(git rev-parse HEAD:1.t) >>expect.raw &&
echo $(git rev-parse HEAD:2.t) >>expect.raw
fi &&
mv "$path" "$path.hidden" &&
test_when_finished "mv $path.hidden $path" &&
git rev-list --missing=$action --objects --no-object-names \
HEAD >actual.raw &&
# When the action is to print, we should also add the missing
# oid to the expect list.
case $action in
allow-any)
;;
print)
grep ?$oid actual.raw &&
echo ?$oid >>expect.raw
;;
esac &&
sort actual.raw >actual &&
sort expect.raw >expect &&
test_cmp expect actual
'
done
done
for missing_tip in "annot_tag" "regul_tag" "a_branch" "HEAD~1" "HEAD~1^{tree}" "HEAD:1.t"
rev-list: allow missing tips with --missing=[print|allow*] In 9830926c7d (rev-list: add commit object support in `--missing` option, 2023-10-27) we fixed the `--missing` option in `git rev-list` so that it works with with missing commits, not just blobs/trees. Unfortunately, such a command would still fail with a "fatal: bad object <oid>" if it is passed a missing commit, blob or tree as an argument (before the rev walking even begins). When such a command is used to find the dependencies of some objects, for example the dependencies of quarantined objects (see the "QUARANTINE ENVIRONMENT" section in the git-receive-pack(1) documentation), it would be better if the command would instead consider such missing objects, especially commits, in the same way as other missing objects. If, for example `--missing=print` is used, it would be nice for some use cases if the missing tips passed as arguments were reported in the same way as other missing objects instead of the command just failing. We could introduce a new option to make it work like this, but most users are likely to prefer the command to have this behavior as the default one. Introducing a new option would require another dumb loop to look for that option early, which isn't nice. Also we made `git rev-list` work with missing commits very recently and the command is most often passed commits as arguments. So let's consider this as a bug fix related to these recent changes. While at it let's add a NEEDSWORK comment to say that we should get rid of the existing ugly dumb loops that parse the `--exclude-promisor-objects` and `--missing=...` options early. Helped-by: Linus Arver <linusa@google.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-14 14:25:13 +00:00
do
# We want to check that things work when both
# - all the tips passed are missing (case existing_tip = ""), and
# - there is one missing tip and one existing tip (case existing_tip = "HEAD")
for existing_tip in "" "HEAD"
do
for action in "allow-any" "print"
do
test_expect_success "--missing=$action with tip '$missing_tip' missing and tip '$existing_tip'" '
# Before the object is made missing, we use rev-list to
# get the expected oids.
if test "$existing_tip" = "HEAD"
then
git rev-list --objects --no-object-names \
HEAD ^$missing_tip >expect.raw
else
>expect.raw
fi &&
# Blobs are shared by all commits, so even though a commit/tree
# might be skipped, its blob must be accounted for.
if test "$existing_tip" = "HEAD" && test $missing_tip != "HEAD:1.t"
then
echo $(git rev-parse HEAD:1.t) >>expect.raw &&
echo $(git rev-parse HEAD:2.t) >>expect.raw
fi &&
missing_oid="$(git rev-parse $missing_tip)" &&
if test "$missing_tip" = "annot_tag"
then
oid="$(git rev-parse $missing_tip^{commit})" &&
echo "$missing_oid" >>expect.raw
else
oid="$missing_oid"
fi &&
path=".git/objects/$(test_oid_to_path $oid)" &&
rev-list: allow missing tips with --missing=[print|allow*] In 9830926c7d (rev-list: add commit object support in `--missing` option, 2023-10-27) we fixed the `--missing` option in `git rev-list` so that it works with with missing commits, not just blobs/trees. Unfortunately, such a command would still fail with a "fatal: bad object <oid>" if it is passed a missing commit, blob or tree as an argument (before the rev walking even begins). When such a command is used to find the dependencies of some objects, for example the dependencies of quarantined objects (see the "QUARANTINE ENVIRONMENT" section in the git-receive-pack(1) documentation), it would be better if the command would instead consider such missing objects, especially commits, in the same way as other missing objects. If, for example `--missing=print` is used, it would be nice for some use cases if the missing tips passed as arguments were reported in the same way as other missing objects instead of the command just failing. We could introduce a new option to make it work like this, but most users are likely to prefer the command to have this behavior as the default one. Introducing a new option would require another dumb loop to look for that option early, which isn't nice. Also we made `git rev-list` work with missing commits very recently and the command is most often passed commits as arguments. So let's consider this as a bug fix related to these recent changes. While at it let's add a NEEDSWORK comment to say that we should get rid of the existing ugly dumb loops that parse the `--exclude-promisor-objects` and `--missing=...` options early. Helped-by: Linus Arver <linusa@google.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-14 14:25:13 +00:00
mv "$path" "$path.hidden" &&
test_when_finished "mv $path.hidden $path" &&
git rev-list --missing=$action --objects --no-object-names \
$missing_oid $existing_tip >actual.raw &&
rev-list: allow missing tips with --missing=[print|allow*] In 9830926c7d (rev-list: add commit object support in `--missing` option, 2023-10-27) we fixed the `--missing` option in `git rev-list` so that it works with with missing commits, not just blobs/trees. Unfortunately, such a command would still fail with a "fatal: bad object <oid>" if it is passed a missing commit, blob or tree as an argument (before the rev walking even begins). When such a command is used to find the dependencies of some objects, for example the dependencies of quarantined objects (see the "QUARANTINE ENVIRONMENT" section in the git-receive-pack(1) documentation), it would be better if the command would instead consider such missing objects, especially commits, in the same way as other missing objects. If, for example `--missing=print` is used, it would be nice for some use cases if the missing tips passed as arguments were reported in the same way as other missing objects instead of the command just failing. We could introduce a new option to make it work like this, but most users are likely to prefer the command to have this behavior as the default one. Introducing a new option would require another dumb loop to look for that option early, which isn't nice. Also we made `git rev-list` work with missing commits very recently and the command is most often passed commits as arguments. So let's consider this as a bug fix related to these recent changes. While at it let's add a NEEDSWORK comment to say that we should get rid of the existing ugly dumb loops that parse the `--exclude-promisor-objects` and `--missing=...` options early. Helped-by: Linus Arver <linusa@google.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-14 14:25:13 +00:00
# When the action is to print, we should also add the missing
# oid to the expect list.
case $action in
allow-any)
;;
print)
grep ?$oid actual.raw &&
echo ?$oid >>expect.raw
;;
esac &&
sort actual.raw >actual &&
sort expect.raw >expect &&
test_cmp expect actual
'
done
done
done
test_done