git/t/t5318-commit-graph.sh

949 lines
31 KiB
Bash
Raw Normal View History

#!/bin/sh
test_description='commit graph'
. ./test-lib.sh
. "$TEST_DIRECTORY"/lib-chunk.sh
GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0
test_expect_success 'usage' '
test_expect_code 129 git commit-graph write blah 2>err &&
test_expect_code 129 git commit-graph write verify
'
test_expect_success 'usage shown without sub-command' '
test_expect_code 129 git commit-graph 2>err &&
grep usage: err
'
test_expect_success 'usage shown with an error on unknown sub-command' '
cat >expect <<-\EOF &&
error: unknown subcommand: `unknown'\''
EOF
test_expect_code 129 git commit-graph unknown 2>stderr &&
grep error stderr >actual &&
test_cmp expect actual
'
objdir=".git/objects"
test_expect_success 'setup full repo' '
git init full
'
commit-graph.c: write non-split graphs as read-only In the previous commit, Git learned 'hold_lock_file_for_update_mode' to allow the caller to specify the permission bits (prior to further adjustment by the umask and shared repository permissions) used when acquiring a temporary file. Use this in the commit-graph machinery for writing a non-split graph to acquire an opened temporary file with permissions read-only permissions to match the split behavior. (In the split case, Git uses git_mkstemp_mode' for each of the commit-graph layers with permission bits '0444'). One can notice this discrepancy when moving a non-split graph to be part of a new chain. This causes a commit-graph chain where all layers have read-only permission bits, except for the base layer, which is writable for the current user. Resolve this discrepancy by using the new 'hold_lock_file_for_update_mode' and passing the desired permission bits. Doing so causes some test fallout in t5318 and t6600. In t5318, this occurs in tests that corrupt a commit-graph file by writing into it. For these, 'chmod u+w'-ing the file beforehand resolves the issue. The additional spot in 'corrupt_graph_verify' is necessary because of the extra 'git commit-graph write' beforehand (which *does* rewrite the commit-graph file). In t6600, this is caused by copying a read-only commit-graph file into place and then trying to replace it. For these, make these files writable. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-29 17:36:38 +00:00
test_expect_success POSIXPERM 'tweak umask for modebit tests' '
umask 022
'
test_expect_success 'verify graph with no graph file' '
git -C full commit-graph verify
'
test_expect_success 'write graph with no packs' '
git -C full commit-graph write --object-dir $objdir &&
test_path_is_missing full/$objdir/info/commit-graph
'
commit-graph: error out on invalid commit oids in 'write --stdin-commits' While 'git commit-graph write --stdin-commits' expects commit object ids as input, it accepts and silently skips over any invalid commit object ids, and still exits with success: # nonsense $ echo not-a-commit-oid | git commit-graph write --stdin-commits $ echo $? 0 # sometimes I forgot that refs are not good... $ echo HEAD | git commit-graph write --stdin-commits $ echo $? 0 # valid tree OID, but not a commit OID $ git rev-parse HEAD^{tree} | git commit-graph write --stdin-commits $ echo $? 0 $ ls -l .git/objects/info/commit-graph ls: cannot access '.git/objects/info/commit-graph': No such file or directory Check that all input records are indeed valid commit object ids and return with error otherwise, the same way '--stdin-packs' handles invalid input; see e103f7276f (commit-graph: return with errors during write, 2019-06-12). Note that it should only return with error when encountering an invalid commit object id coming from standard input. However, '--reachable' uses the same code path to process object ids pointed to by all refs, and that includes tag object ids as well, which should still be skipped over. Therefore add a new flag to 'enum commit_graph_write_flags' and a corresponding field to 'struct write_commit_graph_context', so we can differentiate between those two cases. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-05 08:02:40 +00:00
test_expect_success 'exit with correct error on bad input to --stdin-packs' '
echo doesnotexist >in &&
test_expect_code 1 git -C full commit-graph write --stdin-packs \
<in 2>stderr &&
test_grep "error adding pack" stderr
'
test_expect_success 'create commits and repack' '
for i in $(test_seq 3)
do
test_commit -C full $i &&
git -C full branch commits/$i || return 1
done &&
git -C full repack
'
. "$TEST_DIRECTORY"/lib-commit-graph.sh
graph_git_behavior 'no graph' full commits/3 commits/1
test_expect_success 'exit with correct error on bad input to --stdin-commits' '
commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in 'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on receiving non-commit OIDs as input to '--stdin-commits'. This behavior can be cumbersome to work around in, say, the case of piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if the caller does not want to cull out non-commits themselves. In this situation, it would be ideal if 'git commit-graph write' wrote the graph containing the inputs that did pertain to commits, and silently ignored the remainder of the input. Some options have been proposed to the effect of '--[no-]check-oids' which would allow callers to have the commit-graph builtin do just that. After some discussion, it is difficult to imagine a caller who wouldn't want to pass '--no-check-oids', suggesting that we should get rid of the behavior of complaining about non-commit inputs altogether. If callers do wish to retain this behavior, they can easily work around this change by doing the following: git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' | awk ' !/commit/ { print "not-a-commit:"$1 } /commit/ { print $1 } ' | git commit-graph write --stdin-commits To make it so that valid OIDs that refer to non-existent objects are indeed an error after loosening the error handling, perform an extra lookup to make sure that object indeed exists before sending it to the commit-graph internals. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-13 21:59:55 +00:00
# invalid, non-hex OID
echo HEAD | test_expect_code 1 git -C full commit-graph write \
--stdin-commits 2>stderr &&
test_grep "unexpected non-hex object ID: HEAD" stderr &&
commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in 'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on receiving non-commit OIDs as input to '--stdin-commits'. This behavior can be cumbersome to work around in, say, the case of piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if the caller does not want to cull out non-commits themselves. In this situation, it would be ideal if 'git commit-graph write' wrote the graph containing the inputs that did pertain to commits, and silently ignored the remainder of the input. Some options have been proposed to the effect of '--[no-]check-oids' which would allow callers to have the commit-graph builtin do just that. After some discussion, it is difficult to imagine a caller who wouldn't want to pass '--no-check-oids', suggesting that we should get rid of the behavior of complaining about non-commit inputs altogether. If callers do wish to retain this behavior, they can easily work around this change by doing the following: git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' | awk ' !/commit/ { print "not-a-commit:"$1 } /commit/ { print $1 } ' | git commit-graph write --stdin-commits To make it so that valid OIDs that refer to non-existent objects are indeed an error after loosening the error handling, perform an extra lookup to make sure that object indeed exists before sending it to the commit-graph internals. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-13 21:59:55 +00:00
# non-existent OID
echo $ZERO_OID | test_expect_code 1 git -C full commit-graph write \
--stdin-commits 2>stderr &&
test_grep "invalid object" stderr &&
commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in 'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on receiving non-commit OIDs as input to '--stdin-commits'. This behavior can be cumbersome to work around in, say, the case of piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if the caller does not want to cull out non-commits themselves. In this situation, it would be ideal if 'git commit-graph write' wrote the graph containing the inputs that did pertain to commits, and silently ignored the remainder of the input. Some options have been proposed to the effect of '--[no-]check-oids' which would allow callers to have the commit-graph builtin do just that. After some discussion, it is difficult to imagine a caller who wouldn't want to pass '--no-check-oids', suggesting that we should get rid of the behavior of complaining about non-commit inputs altogether. If callers do wish to retain this behavior, they can easily work around this change by doing the following: git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' | awk ' !/commit/ { print "not-a-commit:"$1 } /commit/ { print $1 } ' | git commit-graph write --stdin-commits To make it so that valid OIDs that refer to non-existent objects are indeed an error after loosening the error handling, perform an extra lookup to make sure that object indeed exists before sending it to the commit-graph internals. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-13 21:59:55 +00:00
# valid commit and tree OID
git -C full rev-parse HEAD HEAD^{tree} >in &&
git -C full commit-graph write --stdin-commits <in &&
graph_read_expect -C full 3 generation_data
'
test_expect_success 'write graph' '
git -C full commit-graph write &&
test_path_is_file full/$objdir/info/commit-graph &&
graph_read_expect -C full 3 generation_data
'
commit-graph.c: write non-split graphs as read-only In the previous commit, Git learned 'hold_lock_file_for_update_mode' to allow the caller to specify the permission bits (prior to further adjustment by the umask and shared repository permissions) used when acquiring a temporary file. Use this in the commit-graph machinery for writing a non-split graph to acquire an opened temporary file with permissions read-only permissions to match the split behavior. (In the split case, Git uses git_mkstemp_mode' for each of the commit-graph layers with permission bits '0444'). One can notice this discrepancy when moving a non-split graph to be part of a new chain. This causes a commit-graph chain where all layers have read-only permission bits, except for the base layer, which is writable for the current user. Resolve this discrepancy by using the new 'hold_lock_file_for_update_mode' and passing the desired permission bits. Doing so causes some test fallout in t5318 and t6600. In t5318, this occurs in tests that corrupt a commit-graph file by writing into it. For these, 'chmod u+w'-ing the file beforehand resolves the issue. The additional spot in 'corrupt_graph_verify' is necessary because of the extra 'git commit-graph write' beforehand (which *does* rewrite the commit-graph file). In t6600, this is caused by copying a read-only commit-graph file into place and then trying to replace it. For these, make these files writable. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-29 17:36:38 +00:00
test_expect_success POSIXPERM 'write graph has correct permissions' '
test_path_is_file full/$objdir/info/commit-graph &&
commit-graph.c: write non-split graphs as read-only In the previous commit, Git learned 'hold_lock_file_for_update_mode' to allow the caller to specify the permission bits (prior to further adjustment by the umask and shared repository permissions) used when acquiring a temporary file. Use this in the commit-graph machinery for writing a non-split graph to acquire an opened temporary file with permissions read-only permissions to match the split behavior. (In the split case, Git uses git_mkstemp_mode' for each of the commit-graph layers with permission bits '0444'). One can notice this discrepancy when moving a non-split graph to be part of a new chain. This causes a commit-graph chain where all layers have read-only permission bits, except for the base layer, which is writable for the current user. Resolve this discrepancy by using the new 'hold_lock_file_for_update_mode' and passing the desired permission bits. Doing so causes some test fallout in t5318 and t6600. In t5318, this occurs in tests that corrupt a commit-graph file by writing into it. For these, 'chmod u+w'-ing the file beforehand resolves the issue. The additional spot in 'corrupt_graph_verify' is necessary because of the extra 'git commit-graph write' beforehand (which *does* rewrite the commit-graph file). In t6600, this is caused by copying a read-only commit-graph file into place and then trying to replace it. For these, make these files writable. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-29 17:36:38 +00:00
echo "-r--r--r--" >expect &&
test_modebits full/$objdir/info/commit-graph >actual &&
commit-graph.c: write non-split graphs as read-only In the previous commit, Git learned 'hold_lock_file_for_update_mode' to allow the caller to specify the permission bits (prior to further adjustment by the umask and shared repository permissions) used when acquiring a temporary file. Use this in the commit-graph machinery for writing a non-split graph to acquire an opened temporary file with permissions read-only permissions to match the split behavior. (In the split case, Git uses git_mkstemp_mode' for each of the commit-graph layers with permission bits '0444'). One can notice this discrepancy when moving a non-split graph to be part of a new chain. This causes a commit-graph chain where all layers have read-only permission bits, except for the base layer, which is writable for the current user. Resolve this discrepancy by using the new 'hold_lock_file_for_update_mode' and passing the desired permission bits. Doing so causes some test fallout in t5318 and t6600. In t5318, this occurs in tests that corrupt a commit-graph file by writing into it. For these, 'chmod u+w'-ing the file beforehand resolves the issue. The additional spot in 'corrupt_graph_verify' is necessary because of the extra 'git commit-graph write' beforehand (which *does* rewrite the commit-graph file). In t6600, this is caused by copying a read-only commit-graph file into place and then trying to replace it. For these, make these files writable. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-29 17:36:38 +00:00
test_cmp expect actual
'
graph_git_behavior 'graph exists' full commits/3 commits/1
test_expect_success 'Add more commits' '
git -C full reset --hard commits/1 &&
for i in $(test_seq 4 5)
do
test_commit -C full $i &&
git -C full branch commits/$i || return 1
done &&
git -C full reset --hard commits/2 &&
for i in $(test_seq 6 7)
do
test_commit -C full $i &&
git -C full branch commits/$i || return 1
done &&
git -C full reset --hard commits/2 &&
git -C full merge commits/4 &&
git -C full branch merge/1 &&
git -C full reset --hard commits/4 &&
git -C full merge commits/6 &&
git -C full branch merge/2 &&
git -C full reset --hard commits/3 &&
git -C full merge commits/5 commits/7 &&
git -C full branch merge/3 &&
git -C full repack
'
test_expect_success 'commit-graph write progress off for redirected stderr' '
git -C full commit-graph write 2>err &&
test_must_be_empty err
'
test_expect_success 'commit-graph write force progress on for stderr' '
GIT_PROGRESS_DELAY=0 git -C full commit-graph write --progress 2>err &&
test_file_not_empty err
'
test_expect_success 'commit-graph write with the --no-progress option' '
git -C full commit-graph write --no-progress 2>err &&
test_must_be_empty err
'
test_expect_success 'commit-graph write --stdin-commits progress off for redirected stderr' '
git -C full rev-parse commits/5 >in &&
git -C full commit-graph write --stdin-commits <in 2>err &&
test_must_be_empty err
'
test_expect_success 'commit-graph write --stdin-commits force progress on for stderr' '
git -C full rev-parse commits/5 >in &&
GIT_PROGRESS_DELAY=0 git -C full commit-graph write --stdin-commits \
--progress <in 2>err &&
test_grep "Collecting commits from input" err
'
test_expect_success 'commit-graph write --stdin-commits with the --no-progress option' '
git -C full rev-parse commits/5 >in &&
git -C full commit-graph write --stdin-commits --no-progress <in 2>err &&
test_must_be_empty err
'
test_expect_success 'commit-graph verify progress off for redirected stderr' '
git -C full commit-graph verify 2>err &&
test_must_be_empty err
'
test_expect_success 'commit-graph verify force progress on for stderr' '
GIT_PROGRESS_DELAY=0 git -C full commit-graph verify --progress 2>err &&
test_file_not_empty err
'
test_expect_success 'commit-graph verify with the --no-progress option' '
git -C full commit-graph verify --no-progress 2>err &&
test_must_be_empty err
'
# Current graph structure:
#
# __M3___
# / | \
# 3 M1 5 M2 7
# |/ \|/ \|
# 2 4 6
# |___/____/
# 1
test_expect_success 'write graph with merges' '
git -C full commit-graph write &&
test_path_is_file full/$objdir/info/commit-graph &&
graph_read_expect -C full 10 "generation_data extra_edges"
'
graph_git_behavior 'merge 1 vs 2' full merge/1 merge/2
graph_git_behavior 'merge 1 vs 3' full merge/1 merge/3
graph_git_behavior 'merge 2 vs 3' full merge/2 merge/3
test_expect_success 'Add one more commit' '
test_commit -C full 8 &&
git -C full branch commits/8 &&
ls full/$objdir/pack | grep idx >existing-idx &&
git -C full repack &&
ls full/$objdir/pack| grep idx | grep -v -f existing-idx >new-idx
'
# Current graph structure:
#
# 8
# |
# __M3___
# / | \
# 3 M1 5 M2 7
# |/ \|/ \|
# 2 4 6
# |___/____/
# 1
graph_git_behavior 'mixed mode, commit 8 vs merge 1' full commits/8 merge/1
graph_git_behavior 'mixed mode, commit 8 vs merge 2' full commits/8 merge/2
test_expect_success 'write graph with new commit' '
git -C full commit-graph write &&
test_path_is_file full/$objdir/info/commit-graph &&
graph_read_expect -C full 11 "generation_data extra_edges"
'
graph_git_behavior 'full graph, commit 8 vs merge 1' full commits/8 merge/1
graph_git_behavior 'full graph, commit 8 vs merge 2' full commits/8 merge/2
test_expect_success 'write graph with nothing new' '
git -C full commit-graph write &&
test_path_is_file full/$objdir/info/commit-graph &&
graph_read_expect -C full 11 "generation_data extra_edges"
'
graph_git_behavior 'cleared graph, commit 8 vs merge 1' full commits/8 merge/1
graph_git_behavior 'cleared graph, commit 8 vs merge 2' full commits/8 merge/2
test_expect_success 'build graph from latest pack with closure' '
git -C full commit-graph write --stdin-packs <new-idx &&
test_path_is_file full/$objdir/info/commit-graph &&
graph_read_expect -C full 9 "generation_data extra_edges"
'
graph_git_behavior 'graph from pack, commit 8 vs merge 1' full commits/8 merge/1
graph_git_behavior 'graph from pack, commit 8 vs merge 2' full commits/8 merge/2
test_expect_success 'build graph from commits with closure' '
git -C full tag -a -m "merge" tag/merge merge/2 &&
git -C full rev-parse tag/merge >commits-in &&
git -C full rev-parse merge/1 >>commits-in &&
git -C full commit-graph write --stdin-commits <commits-in &&
test_path_is_file full/$objdir/info/commit-graph &&
graph_read_expect -C full 6 "generation_data"
'
graph_git_behavior 'graph from commits, commit 8 vs merge 1' full commits/8 merge/1
graph_git_behavior 'graph from commits, commit 8 vs merge 2' full commits/8 merge/2
test_expect_success 'build graph from commits with append' '
git -C full rev-parse merge/3 >in &&
git -C full commit-graph write --stdin-commits --append <in &&
test_path_is_file full/$objdir/info/commit-graph &&
graph_read_expect -C full 10 "generation_data extra_edges"
'
graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
test_expect_success 'build graph using --reachable' '
git -C full commit-graph write --reachable &&
test_path_is_file full/$objdir/info/commit-graph &&
graph_read_expect -C full 11 "generation_data extra_edges"
'
graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
test_expect_success 'setup bare repo' '
git clone --bare --no-local full bare
'
graph_git_behavior 'bare repo, commit 8 vs merge 1' bare commits/8 merge/1
graph_git_behavior 'bare repo, commit 8 vs merge 2' bare commits/8 merge/2
test_expect_success 'write graph in bare repo' '
git -C bare commit-graph write &&
test_path_is_file bare/objects/info/commit-graph &&
graph_read_expect -C bare 11 "generation_data extra_edges"
'
graph_git_behavior 'bare repo with graph, commit 8 vs merge 1' bare commits/8 merge/1
graph_git_behavior 'bare repo with graph, commit 8 vs merge 2' bare commits/8 merge/2
test_expect_success 'perform fast-forward merge in full repo' '
git -C full checkout -b merge-5-to-8 commits/5 &&
git -C full merge commits/8 &&
git -C full show-ref -s merge-5-to-8 >output &&
git -C full show-ref -s commits/8 >expect &&
test_cmp expect output
'
test_expect_success 'check that gc computes commit-graph' '
test_commit -C full --no-tag blank &&
git -C full commit-graph write --reachable &&
cp full/$objdir/info/commit-graph commit-graph-before-gc &&
git -C full reset --hard HEAD~1 &&
test_config -C full gc.writeCommitGraph true &&
git -C full gc &&
cp full/$objdir/info/commit-graph commit-graph-after-gc &&
! test_cmp_bin commit-graph-before-gc commit-graph-after-gc &&
git -C full commit-graph write --reachable &&
test_cmp_bin commit-graph-after-gc full/$objdir/info/commit-graph
'
test_expect_success 'replace-objects invalidates commit-graph' '
test_when_finished rm -rf replace &&
git clone full replace &&
(
cd replace &&
git commit-graph write --reachable &&
test_path_is_file .git/objects/info/commit-graph &&
git replace HEAD~1 HEAD~2 &&
graph_git_two_modes "commit-graph verify" &&
git -c core.commitGraph=false log >expect &&
git -c core.commitGraph=true log >actual &&
test_cmp expect actual &&
git commit-graph write --reachable &&
git -c core.commitGraph=false --no-replace-objects log >expect &&
git -c core.commitGraph=true --no-replace-objects log >actual &&
test_cmp expect actual &&
rm -rf .git/objects/info/commit-graph &&
git commit-graph write --reachable &&
test_path_is_file .git/objects/info/commit-graph
)
'
test_expect_success 'commit grafts invalidate commit-graph' '
test_when_finished rm -rf graft &&
git clone --template= full graft &&
(
cd graft &&
git commit-graph write --reachable &&
test_path_is_file .git/objects/info/commit-graph &&
H1=$(git rev-parse --verify HEAD~1) &&
H3=$(git rev-parse --verify HEAD~3) &&
mkdir .git/info &&
echo "$H1 $H3" >.git/info/grafts &&
git -c core.commitGraph=false log >expect &&
git -c core.commitGraph=true log >actual &&
test_cmp expect actual &&
git commit-graph write --reachable &&
git -c core.commitGraph=false --no-replace-objects log >expect &&
git -c core.commitGraph=true --no-replace-objects log >actual &&
test_cmp expect actual &&
rm -rf .git/objects/info/commit-graph &&
git commit-graph write --reachable &&
test_path_is_missing .git/objects/info/commit-graph
)
'
test_expect_success 'replace-objects invalidates commit-graph' '
test_when_finished rm -rf shallow &&
git clone --depth 2 "file://$TRASH_DIRECTORY/full" shallow &&
(
cd shallow &&
git commit-graph write --reachable &&
test_path_is_missing .git/objects/info/commit-graph &&
git fetch origin --unshallow &&
git commit-graph write --reachable &&
test_path_is_file .git/objects/info/commit-graph
)
'
commit-graph: use the "hash version" byte The commit-graph format reserved a byte among the header of the file to store a "hash version". During the SHA-256 work, this was not modified because file formats are not necessarily intended to work across hash versions. If a repository has SHA-256 as its hash algorithm, it automatically up-shifts the lengths of object names in all necessary formats. However, since we have this byte available for adjusting the version, we can make the file formats more obviously incompatible instead of relying on other context from the repository. Update the oid_version() method in commit-graph.c to add a new value, 2, for sha-256. This automatically writes the new value in a SHA-256 repository _and_ verifies the value is correct. This is a breaking change relative to the current 'master' branch since 092b677 (Merge branch 'bc/sha-256-cvs-svn-updates', 2020-08-13) but it is not breaking relative to any released version of Git. The test impact is relatively minor: the output of 'test-tool read-graph' lists the header information, so those instances of '1' need to be replaced with a variable determined by GIT_TEST_DEFAULT_HASH. A more careful test is added that specifically creates a repository of each type then swaps the commit-graph files. The important value here is that the "git log" command succeeds while writing a message to stderr. Helped-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-17 14:04:47 +00:00
test_expect_success 'warn on improper hash version' '
git init --object-format=sha1 sha1 &&
(
cd sha1 &&
test_commit 1 &&
git commit-graph write --reachable &&
mv .git/objects/info/commit-graph ../cg-sha1
) &&
git init --object-format=sha256 sha256 &&
(
cd sha256 &&
test_commit 1 &&
git commit-graph write --reachable &&
mv .git/objects/info/commit-graph ../cg-sha256
) &&
(
cd sha1 &&
mv ../cg-sha256 .git/objects/info/commit-graph &&
git log -1 2>err &&
test_grep "commit-graph hash version 2 does not match version 1" err
commit-graph: use the "hash version" byte The commit-graph format reserved a byte among the header of the file to store a "hash version". During the SHA-256 work, this was not modified because file formats are not necessarily intended to work across hash versions. If a repository has SHA-256 as its hash algorithm, it automatically up-shifts the lengths of object names in all necessary formats. However, since we have this byte available for adjusting the version, we can make the file formats more obviously incompatible instead of relying on other context from the repository. Update the oid_version() method in commit-graph.c to add a new value, 2, for sha-256. This automatically writes the new value in a SHA-256 repository _and_ verifies the value is correct. This is a breaking change relative to the current 'master' branch since 092b677 (Merge branch 'bc/sha-256-cvs-svn-updates', 2020-08-13) but it is not breaking relative to any released version of Git. The test impact is relatively minor: the output of 'test-tool read-graph' lists the header information, so those instances of '1' need to be replaced with a variable determined by GIT_TEST_DEFAULT_HASH. A more careful test is added that specifically creates a repository of each type then swaps the commit-graph files. The important value here is that the "git log" command succeeds while writing a message to stderr. Helped-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-17 14:04:47 +00:00
) &&
(
cd sha256 &&
mv ../cg-sha1 .git/objects/info/commit-graph &&
git log -1 2>err &&
test_grep "commit-graph hash version 1 does not match version 2" err
commit-graph: use the "hash version" byte The commit-graph format reserved a byte among the header of the file to store a "hash version". During the SHA-256 work, this was not modified because file formats are not necessarily intended to work across hash versions. If a repository has SHA-256 as its hash algorithm, it automatically up-shifts the lengths of object names in all necessary formats. However, since we have this byte available for adjusting the version, we can make the file formats more obviously incompatible instead of relying on other context from the repository. Update the oid_version() method in commit-graph.c to add a new value, 2, for sha-256. This automatically writes the new value in a SHA-256 repository _and_ verifies the value is correct. This is a breaking change relative to the current 'master' branch since 092b677 (Merge branch 'bc/sha-256-cvs-svn-updates', 2020-08-13) but it is not breaking relative to any released version of Git. The test impact is relatively minor: the output of 'test-tool read-graph' lists the header information, so those instances of '1' need to be replaced with a variable determined by GIT_TEST_DEFAULT_HASH. A more careful test is added that specifically creates a repository of each type then swaps the commit-graph files. The important value here is that the "git log" command succeeds while writing a message to stderr. Helped-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-17 14:04:47 +00:00
)
'
2022-03-01 19:48:30 +00:00
test_expect_success TIME_IS_64BIT,TIME_T_IS_64BIT 'lower layers have overflow chunk' '
commit-graph: always parse before commit_graph_data_at() There is a subtle failure happening when computing corrected commit dates with --split enabled. It requires a base layer needing the generation_data_overflow chunk. Then, the next layer on top erroneously thinks it needs an overflow chunk due to a bug leading to recalculating all reachable generation numbers. The output of the failure is BUG: commit-graph.c:1912: expected to write 8 bytes to chunk 47444f56, but wrote 0 instead These "expected" 8 bytes are due to re-computing the corrected commit date for the lower layer but the new layer does not need any overflow. Add a test to t5318-commit-graph.sh that demonstrates this bug. However, it does not trigger consistently with the existing code. The generation number data is stored in a slab and accessed by commit_graph_data_at(). This data is initialized when parsing a commit, but is otherwise used assuming it has been populated. The loop in compute_generation_numbers() did not enforce that all reachable commits were parsed and had correct values. This could lead to some problems when writing a commit-graph with corrected commit dates based on a commit-graph without them. It has been difficult to identify the issue here because it was so hard to reproduce. It relies on this uninitialized data having a non-zero value, but also on specifically in a way that overwrites the existing data. This patch adds the extra parse to ensure the data is filled before we compute the generation number of a commit. This triggers the new test to fail because the generation number overflow count does not match between this computation and the write for that chunk. The actual fix will follow as the next few changes. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-01 17:15:04 +00:00
UNIX_EPOCH_ZERO="@0 +0000" &&
2022-03-01 19:48:30 +00:00
FUTURE_DATE="@4147483646 +0000" &&
rm -f full/.git/objects/info/commit-graph &&
test_commit -C full --date "$FUTURE_DATE" future-1 &&
test_commit -C full --date "$UNIX_EPOCH_ZERO" old-1 &&
git -C full commit-graph write --reachable &&
test_commit -C full --date "$FUTURE_DATE" future-2 &&
test_commit -C full --date "$UNIX_EPOCH_ZERO" old-2 &&
git -C full commit-graph write --reachable --split=no-merge &&
test_commit -C full extra &&
git -C full commit-graph write --reachable --split=no-merge &&
git -C full commit-graph write --reachable &&
graph_read_expect -C full 16 \
"generation_data generation_data_overflow extra_edges" &&
mv full/.git/objects/info/commit-graph commit-graph-upgraded &&
git -C full commit-graph write --reachable &&
graph_read_expect -C full 16 \
"generation_data generation_data_overflow extra_edges" &&
test_cmp full/.git/objects/info/commit-graph commit-graph-upgraded
commit-graph: always parse before commit_graph_data_at() There is a subtle failure happening when computing corrected commit dates with --split enabled. It requires a base layer needing the generation_data_overflow chunk. Then, the next layer on top erroneously thinks it needs an overflow chunk due to a bug leading to recalculating all reachable generation numbers. The output of the failure is BUG: commit-graph.c:1912: expected to write 8 bytes to chunk 47444f56, but wrote 0 instead These "expected" 8 bytes are due to re-computing the corrected commit date for the lower layer but the new layer does not need any overflow. Add a test to t5318-commit-graph.sh that demonstrates this bug. However, it does not trigger consistently with the existing code. The generation number data is stored in a slab and accessed by commit_graph_data_at(). This data is initialized when parsing a commit, but is otherwise used assuming it has been populated. The loop in compute_generation_numbers() did not enforce that all reachable commits were parsed and had correct values. This could lead to some problems when writing a commit-graph with corrected commit dates based on a commit-graph without them. It has been difficult to identify the issue here because it was so hard to reproduce. It relies on this uninitialized data having a non-zero value, but also on specifically in a way that overwrites the existing data. This patch adds the extra parse to ensure the data is filled before we compute the generation number of a commit. This triggers the new test to fail because the generation number overflow count does not match between this computation and the write for that chunk. The actual fix will follow as the next few changes. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-01 17:15:04 +00:00
'
# the verify tests below expect the commit-graph to contain
# exactly the commits reachable from the commits/8 branch.
# If the file changes the set of commits in the list, then the
# offsets into the binary file will result in different edits
# and the tests will likely break.
test_expect_success 'git commit-graph verify' '
git -C full rev-parse commits/8 >in &&
git -C full -c commitGraph.generationVersion=1 commit-graph write \
--stdin-commits <in &&
git -C full commit-graph verify >output &&
graph_read_expect -C full 9 extra_edges 1
'
NUM_COMMITS=9
NUM_OCTOPUS_EDGES=2
HASH_LEN="$(test_oid rawsz)"
GRAPH_BYTE_VERSION=4
GRAPH_BYTE_HASH=5
GRAPH_BYTE_CHUNK_COUNT=6
GRAPH_CHUNK_LOOKUP_OFFSET=8
GRAPH_CHUNK_LOOKUP_WIDTH=12
GRAPH_CHUNK_LOOKUP_ROWS=5
GRAPH_BYTE_OID_FANOUT_ID=$GRAPH_CHUNK_LOOKUP_OFFSET
GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
1 * $GRAPH_CHUNK_LOOKUP_WIDTH))
GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
2 * $GRAPH_CHUNK_LOOKUP_WIDTH))
GRAPH_FANOUT_OFFSET=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
$GRAPH_CHUNK_LOOKUP_WIDTH * $GRAPH_CHUNK_LOOKUP_ROWS))
GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 * 4))
GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 * 255))
GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 * 256))
GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8))
GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 4 + 10))
GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16))
GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * $NUM_COMMITS))
GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
GRAPH_BYTE_COMMIT_GENERATION_LAST=$(($GRAPH_BYTE_COMMIT_GENERATION + $(($NUM_COMMITS - 1)) * $GRAPH_COMMIT_DATA_WIDTH))
GRAPH_BYTE_COMMIT_DATE=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12))
GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
$GRAPH_COMMIT_DATA_WIDTH * $NUM_COMMITS))
GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 * $NUM_OCTOPUS_EDGES))
corrupt_graph_setup() {
test_when_finished mv commit-graph-backup full/$objdir/info/commit-graph &&
cp full/$objdir/info/commit-graph commit-graph-backup &&
chmod u+w full/$objdir/info/commit-graph
}
corrupt_graph_verify() {
grepstr=$1
test_must_fail git -C full commit-graph verify 2>test_err &&
grep -v "^+" test_err >err &&
test_grep "$grepstr" err &&
commit-graph write: don't die if the existing graph is corrupt When the commit-graph is written we end up calling parse_commit(). This will in turn invoke code that'll consult the existing commit-graph about the commit, if the graph is corrupted we die. We thus get into a state where a failing "commit-graph verify" can't be followed-up with a "commit-graph write" if core.commitGraph=true is set, the graph either needs to be manually removed to proceed, or core.commitGraph needs to be set to "false". Change the "commit-graph write" codepath to use a new parse_commit_no_graph() helper instead of parse_commit() to avoid this. The latter will call repo_parse_commit_internal() with use_commit_graph=1 as seen in 177722b344 ("commit: integrate commit graph with commit parsing", 2018-04-10). Not using the old graph at all slows down the writing of the new graph by some small amount, but is a sensible way to prevent an error in the existing commit-graph from spreading. Just fixing the current issue would be likely to result in code that's inadvertently broken in the future. New code might use the commit-graph at a distance. To detect such cases introduce a "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" setting used when we do our corruption tests, and test that a "write/verify" combo works after every one of our current test cases where we now detect commit-graph corruption. Some of the code changes here might be strictly unnecessary, e.g. I was unable to find cases where the parse_commit() called from write_graph_chunk_data() didn't exit early due to "item->object.parsed" being true in repo_parse_commit_internal() (before the use_commit_graph=1 has any effect). But let's also convert those cases for good measure, we do not have exhaustive tests for all possible types of commit-graph corruption. This might need to be re-visited if we learn to write the commit-graph incrementally, but probably not. Hopefully we'll just start by finding out what commits we have in total, then read the old graph(s) to see what they cover, and finally write a new graph file with everything that's missing. In that case the new graph writing code just needs to continue to use e.g. a parse_commit() that doesn't consult the existing commit-graphs. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-25 12:08:33 +00:00
if test "$2" != "no-copy"
then
cp full/$objdir/info/commit-graph commit-graph-pre-write-test
commit-graph write: don't die if the existing graph is corrupt When the commit-graph is written we end up calling parse_commit(). This will in turn invoke code that'll consult the existing commit-graph about the commit, if the graph is corrupted we die. We thus get into a state where a failing "commit-graph verify" can't be followed-up with a "commit-graph write" if core.commitGraph=true is set, the graph either needs to be manually removed to proceed, or core.commitGraph needs to be set to "false". Change the "commit-graph write" codepath to use a new parse_commit_no_graph() helper instead of parse_commit() to avoid this. The latter will call repo_parse_commit_internal() with use_commit_graph=1 as seen in 177722b344 ("commit: integrate commit graph with commit parsing", 2018-04-10). Not using the old graph at all slows down the writing of the new graph by some small amount, but is a sensible way to prevent an error in the existing commit-graph from spreading. Just fixing the current issue would be likely to result in code that's inadvertently broken in the future. New code might use the commit-graph at a distance. To detect such cases introduce a "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" setting used when we do our corruption tests, and test that a "write/verify" combo works after every one of our current test cases where we now detect commit-graph corruption. Some of the code changes here might be strictly unnecessary, e.g. I was unable to find cases where the parse_commit() called from write_graph_chunk_data() didn't exit early due to "item->object.parsed" being true in repo_parse_commit_internal() (before the use_commit_graph=1 has any effect). But let's also convert those cases for good measure, we do not have exhaustive tests for all possible types of commit-graph corruption. This might need to be re-visited if we learn to write the commit-graph incrementally, but probably not. Hopefully we'll just start by finding out what commits we have in total, then read the old graph(s) to see what they cover, and finally write a new graph file with everything that's missing. In that case the new graph writing code just needs to continue to use e.g. a parse_commit() that doesn't consult the existing commit-graphs. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-25 12:08:33 +00:00
fi &&
git -C full status --short &&
GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE=true git -C full commit-graph write &&
chmod u+w full/$objdir/info/commit-graph &&
git -C full commit-graph verify
}
# usage: corrupt_graph_and_verify <position> <data> <string> [<zero_pos>]
# Manipulates the commit-graph file at the position
# by inserting the data, optionally zeroing the file
# starting at <zero_pos>, then runs 'git commit-graph verify'
# and places the output in the file 'err'. Test 'err' for
# the given string.
corrupt_graph_and_verify() {
pos=$1
data="${2:-\0}"
grepstr=$3
corrupt_graph_setup &&
orig_size=$(wc -c <full/$objdir/info/commit-graph) &&
zero_pos=${4:-${orig_size}} &&
printf "$data" | dd of="full/$objdir/info/commit-graph" bs=1 seek="$pos" conv=notrunc &&
dd of="full/$objdir/info/commit-graph" bs=1 seek="$zero_pos" if=/dev/null &&
test-tool genzeros $(($orig_size - $zero_pos)) >>"full/$objdir/info/commit-graph" &&
corrupt_graph_verify "$grepstr"
}
test_expect_success POSIXPERM,SANITY 'detect permission problem' '
corrupt_graph_setup &&
chmod 000 full/$objdir/info/commit-graph &&
commit-graph write: don't die if the existing graph is corrupt When the commit-graph is written we end up calling parse_commit(). This will in turn invoke code that'll consult the existing commit-graph about the commit, if the graph is corrupted we die. We thus get into a state where a failing "commit-graph verify" can't be followed-up with a "commit-graph write" if core.commitGraph=true is set, the graph either needs to be manually removed to proceed, or core.commitGraph needs to be set to "false". Change the "commit-graph write" codepath to use a new parse_commit_no_graph() helper instead of parse_commit() to avoid this. The latter will call repo_parse_commit_internal() with use_commit_graph=1 as seen in 177722b344 ("commit: integrate commit graph with commit parsing", 2018-04-10). Not using the old graph at all slows down the writing of the new graph by some small amount, but is a sensible way to prevent an error in the existing commit-graph from spreading. Just fixing the current issue would be likely to result in code that's inadvertently broken in the future. New code might use the commit-graph at a distance. To detect such cases introduce a "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" setting used when we do our corruption tests, and test that a "write/verify" combo works after every one of our current test cases where we now detect commit-graph corruption. Some of the code changes here might be strictly unnecessary, e.g. I was unable to find cases where the parse_commit() called from write_graph_chunk_data() didn't exit early due to "item->object.parsed" being true in repo_parse_commit_internal() (before the use_commit_graph=1 has any effect). But let's also convert those cases for good measure, we do not have exhaustive tests for all possible types of commit-graph corruption. This might need to be re-visited if we learn to write the commit-graph incrementally, but probably not. Hopefully we'll just start by finding out what commits we have in total, then read the old graph(s) to see what they cover, and finally write a new graph file with everything that's missing. In that case the new graph writing code just needs to continue to use e.g. a parse_commit() that doesn't consult the existing commit-graphs. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-25 12:08:33 +00:00
corrupt_graph_verify "Could not open" "no-copy"
'
test_expect_success 'detect too small' '
corrupt_graph_setup &&
echo "a small graph" >full/$objdir/info/commit-graph &&
corrupt_graph_verify "too small"
'
test_expect_success 'detect bad signature' '
corrupt_graph_and_verify 0 "\0" \
"graph signature"
'
test_expect_success 'detect bad version' '
corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
"graph version"
'
test_expect_success 'detect bad hash version' '
corrupt_graph_and_verify $GRAPH_BYTE_HASH "\03" \
"hash version"
'
test_expect_success 'detect low chunk count' '
commit-graph: fix parsing the Chunk Lookup table The commit-graph file format specifies that the chunks may be in any order. However, if the OID Lookup chunk happens to be the last one in the file, then any command attempting to access the commit-graph data will fail with: fatal: invalid commit position. commit-graph is likely corrupt In this case the error is wrong, the commit-graph file does conform to the specification, but the parsing of the Chunk Lookup table is a bit buggy, and leaves the field holding the number of commits in the commit-graph zero-initialized. The number of commits in the commit-graph is determined while parsing the Chunk Lookup table, by dividing the size of the OID Lookup chunk with the hash size. However, the Chunk Lookup table doesn't actually store the size of the chunks, but it stores their starting offset. Consequently, the size of a chunk can only be calculated by subtracting the starting offsets of that chunk from the offset of the subsequent chunk, or in case of the last chunk from the offset recorded in the terminating label. This is currenly implemented in a bit complicated way: as we iterate over the entries of the Chunk Lookup table, we check the ID of each chunk and store its starting offset, then we check the ID of the last seen chunk and calculate its size using its previously saved offset if necessary (at the moment it's only necessary for the OID Lookup chunk). Alas, while parsing the Chunk Lookup table we only interate through the "real" chunks, but never look at the terminating label, thus don't even check whether it's necessary to calulate the size of the last chunk. Consequently, if the OID Lookup chunk is the last one, then we don't calculate its size and turn don't run the piece of code determining the number of commits in the commit graph, leaving the field holding that number unchanged (i.e. zero-initialized), eventually triggering the sanity check in load_oid_from_graph(). Fix this by iterating through all entries in the Chunk Lookup table, including the terminating label. Note that this is the minimal fix, suitable for the maintenance track. A better fix would be to simplify how the chunk sizes are calculated, but that is a more invasive change, less suitable for 'maint', so that will be done in later patches. This additional flexibility of scanning more chunks breaks a test for "git commit-graph verify" so alter that test to mutate the commit-graph to have an even lower chunk count. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-05 13:00:24 +00:00
corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\01" \
"final chunk has non-zero id"
'
test_expect_success 'detect missing OID fanout chunk' '
corrupt_graph_and_verify $GRAPH_BYTE_OID_FANOUT_ID "\0" \
"commit-graph required OID fanout chunk missing or corrupted"
'
test_expect_success 'detect missing OID lookup chunk' '
corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ID "\0" \
"commit-graph required OID lookup chunk missing or corrupted"
'
test_expect_success 'detect missing commit data chunk' '
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATA_ID "\0" \
"commit-graph required commit data chunk missing or corrupted"
'
test_expect_success 'detect incorrect fanout' '
corrupt_graph_and_verify $GRAPH_BYTE_FANOUT1 "\01" \
"fanout value"
'
test_expect_success 'detect incorrect fanout final value' '
corrupt_graph_and_verify $GRAPH_BYTE_FANOUT2 "\01" \
commit-graph: use fanout value for graph size Commit-graph, midx, and pack idx files all have both a lookup table of oids and an oid fanout table. In midx and pack idx files, we take the final entry of the fanout table as the source of truth for the number of entries, and then verify that the size of the lookup table matches that. But for commit-graph files, we do the opposite: we use the size of the lookup table as the source of truth, and then check the final fanout entry against it. As noted in 4169d89645 (commit-graph: check consistency of fanout table, 2023-10-09), either is correct. But there are a few reasons to prefer the fanout table as the source of truth: 1. The fanout entries are 32-bits on disk, and that defines the maximum number of entries we can store. But since the size of the lookup table is only bounded by the filesystem, it can be much larger. And hence computing it as the commit-graph does means that we may truncate the result when storing it in a uint32_t. 2. We read the fanout first, then the lookup table. If we're verifying the chunks as we read them, then we'd want to take the fanout as truth (we have nothing yet to check it against) and then we can check that the lookup table matches what we already know. 3. It is pointlessly inconsistent with the midx and pack idx code. Since the three have to do similar size and bounds checks, it is easier to reason about all three if they use the same approach. So this patch moves the assignment of g->num_commits to the fanout parser, and then we can check the size of the lookup chunk as soon as we try to load it. There's already a test covering this situation, which munges the final fanout entry to 2^32-1. In the current code we complain that it does not agree with the table size. But now that we treat the munged value as the source of truth, we'll complain that the lookup table is the wrong size (again, either is correct). So we'll have to update the message we expect (and likewise for an earlier test which does similar munging). There's a similar test for this situation on the midx side, but rather than making a very-large fanout value, it just truncates the lookup table. We could do that here, too, but the very-large fanout value actually shows an interesting corner case. On a 32-bit system, multiplying to find the expected table size would cause an integer overflow. Using st_mult() would detect that, but cause us to die() rather than falling back to the non-graph code path. Checking the size using division (as we do with existing chunk-size checks) avoids the overflow entirely, and the test demonstrates this when run on a 32-bit system. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-09 07:24:35 +00:00
"OID lookup chunk is the wrong size"
'
test_expect_success 'detect incorrect OID order' '
corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ORDER "\01" \
"incorrect OID order"
'
test_expect_success 'detect OID not in object database' '
corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_MISSING "\01" \
"from object database"
'
test_expect_success 'detect incorrect tree OID' '
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_TREE "\01" \
"root tree OID for commit"
'
test_expect_success 'detect incorrect parent int-id' '
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_PARENT "\01" \
"invalid parent"
'
test_expect_success 'detect extra parent int-id' '
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_EXTRA_PARENT "\00" \
"is too long"
'
test_expect_success 'detect wrong parent' '
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_WRONG_PARENT "\01" \
"commit-graph parent for"
'
test_expect_success 'detect incorrect generation number' '
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\070" \
"generation for commit"
'
test_expect_success 'detect incorrect commit date' '
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATE "\01" \
"commit date"
'
test_expect_success 'detect incorrect parent for octopus merge' '
corrupt_graph_and_verify $GRAPH_BYTE_OCTOPUS "\01" \
"invalid parent"
'
test_expect_success 'detect invalid checksum hash' '
corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
"incorrect checksum"
'
test_expect_success 'detect incorrect chunk count' '
corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\377" \
"commit-graph file is too small to hold [0-9]* chunks" \
$GRAPH_CHUNK_LOOKUP_OFFSET
'
test_expect_success 'detect mixed generation numbers (non-zero to zero)' '
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION_LAST "\0\0\0\0" \
"both zero and non-zero generations"
'
test_expect_success 'detect mixed generation numbers (zero to non-zero)' '
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\0\0\0\0" \
"both zero and non-zero generations"
'
test_expect_success 'git fsck (checks commit-graph when config set to true)' '
git -C full fsck &&
corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
"incorrect checksum" &&
cp commit-graph-pre-write-test full/$objdir/info/commit-graph &&
test_must_fail git -C full -c core.commitGraph=true fsck
'
test_expect_success 'git fsck (ignores commit-graph when config set to false)' '
git -C full fsck &&
corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
"incorrect checksum" &&
cp commit-graph-pre-write-test full/$objdir/info/commit-graph &&
git -C full -c core.commitGraph=false fsck
'
test_expect_success 'git fsck (checks commit-graph when config unset)' '
test_when_finished "git -C full config core.commitGraph true" &&
git -C full fsck &&
corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
"incorrect checksum" &&
test_unconfig -C full core.commitGraph &&
cp commit-graph-pre-write-test full/$objdir/info/commit-graph &&
test_must_fail git -C full fsck
'
test_expect_success 'git fsck shows commit-graph output with --progress' '
git -C "$TRASH_DIRECTORY/full" fsck --progress 2>err &&
grep "Verifying commits in commit graph" err
'
test_expect_success 'git fsck suppresses commit-graph output with --no-progress' '
git -C "$TRASH_DIRECTORY/full" fsck --no-progress 2>err &&
! grep "Verifying commits in commit graph" err
'
test_expect_success 'setup non-the_repository tests' '
rm -rf repo &&
git init repo &&
test_commit -C repo one &&
test_commit -C repo two &&
git -C repo config core.commitGraph true &&
git -C repo rev-parse two | \
git -C repo commit-graph write --stdin-commits
'
test_expect_success 'parse_commit_in_graph works for non-the_repository' '
test-tool repository parse_commit_in_graph \
repo/.git repo "$(git -C repo rev-parse two)" >actual &&
{
git -C repo log --pretty=format:"%ct " -1 &&
git -C repo rev-parse one
} >expect &&
test_cmp expect actual &&
test-tool repository parse_commit_in_graph \
repo/.git repo "$(git -C repo rev-parse one)" >actual &&
git -C repo log --pretty="%ct" -1 one >expect &&
test_cmp expect actual
'
test_expect_success 'get_commit_tree_in_graph works for non-the_repository' '
test-tool repository get_commit_tree_in_graph \
repo/.git repo "$(git -C repo rev-parse two)" >actual &&
git -C repo rev-parse two^{tree} >expect &&
test_cmp expect actual &&
test-tool repository get_commit_tree_in_graph \
repo/.git repo "$(git -C repo rev-parse one)" >actual &&
git -C repo rev-parse one^{tree} >expect &&
test_cmp expect actual
'
2019-09-05 22:04:55 +00:00
test_expect_success 'corrupt commit-graph write (broken parent)' '
rm -rf repo &&
git init repo &&
(
cd repo &&
empty="$(git mktree </dev/null)" &&
cat >broken <<-EOF &&
tree $empty
parent $ZERO_OID
author whatever <whatever@example.com> 1234 -0000
committer whatever <whatever@example.com> 1234 -0000
broken commit
EOF
broken="$(git hash-object -w -t commit --literally broken)" &&
git commit-tree -p "$broken" -m "good commit" "$empty" >good &&
test_must_fail git commit-graph write --stdin-commits \
<good 2>test_err &&
test_grep "unable to parse commit" test_err
)
'
test_expect_success 'corrupt commit-graph write (missing tree)' '
rm -rf repo &&
git init repo &&
(
cd repo &&
tree="$(git mktree </dev/null)" &&
cat >broken <<-EOF &&
parent $ZERO_OID
author whatever <whatever@example.com> 1234 -0000
committer whatever <whatever@example.com> 1234 -0000
broken commit
EOF
broken="$(git hash-object -w -t commit --literally broken)" &&
git commit-tree -p "$broken" -m "good" "$tree" >good &&
test_must_fail git commit-graph write --stdin-commits \
<good 2>test_err &&
test_grep "unable to parse commit" test_err
)
'
commit-graph: implement generation data chunk As discovered by Ævar, we cannot increment graph version to distinguish between generation numbers v1 and v2 [1]. Thus, one of pre-requistes before implementing generation number v2 was to distinguish between graph versions in a backwards compatible manner. We are going to introduce a new chunk called Generation DATa chunk (or GDAT). GDAT will store corrected committer date offsets whereas CDAT will still store topological level. Old Git does not understand GDAT chunk and would ignore it, reading topological levels from CDAT. New Git can parse GDAT and take advantage of newer generation numbers, falling back to topological levels when GDAT chunk is missing (as it would happen with a commit-graph written by old Git). We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT' which forces commit-graph file to be written without generation data chunk to emulate a commit-graph file written by old Git. To minimize the space required to store corrrected commit date, Git stores corrected commit date offsets into the commit-graph file, instea of corrected commit dates. This saves us 4 bytes per commit, decreasing the GDAT chunk size by half, but it's possible for the offset to overflow the 4-bytes allocated for storage. As such overflows are and should be exceedingly rare, we use the following overflow management scheme: We introduce a new commit-graph chunk, Generation Data OVerflow ('GDOV') to store corrected commit dates for commits with offsets greater than GENERATION_NUMBER_V2_OFFSET_MAX. If the offset is greater than GENERATION_NUMBER_V2_OFFSET_MAX, we set the MSB of the offset and the other bits store the position of corrected commit date in GDOV chunk, similar to how Extra Edge List is maintained. We test the overflow-related code with the following repo history: F - N - U / \ U - N - U N \ / N - F - N Where the commits denoted by U have committer date of zero seconds since Unix epoch, the commits denoted by N have committer date of 1112354055 (default committer date for the test suite) seconds since Unix epoch and the commits denoted by F have committer date of (2 ^ 31 - 2) seconds since Unix epoch. The largest offset observed is 2 ^ 31, just large enough to overflow. [1]: https://lore.kernel.org/git/87a7gdspo4.fsf@evledraar.gmail.com/ Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-16 18:11:15 +00:00
# We test the overflow-related code with the following repo history:
#
# 4:F - 5:N - 6:U
# / \
# 1:U - 2:N - 3:U M:N
# \ /
# 7:N - 8:F - 9:N
#
# Here the commits denoted by U have committer date of zero seconds
# since Unix epoch, the commits denoted by N have committer date
# starting from 1112354055 seconds since Unix epoch (default committer
# date for the test suite), and the commits denoted by F have committer
# date of (2 ^ 31 - 2) seconds since Unix epoch.
#
# The largest offset observed is 2 ^ 31, just large enough to overflow.
#
test_expect_success 'set up and verify repo with generation data overflow chunk' '
UNIX_EPOCH_ZERO="@0 +0000" &&
FUTURE_DATE="@2147483646 +0000" &&
git init repo &&
(
cd repo &&
test_commit --date "$UNIX_EPOCH_ZERO" 1 &&
test_commit 2 &&
test_commit --date "$UNIX_EPOCH_ZERO" 3 &&
git commit-graph write --reachable &&
graph_read_expect 3 generation_data &&
test_commit --date "$FUTURE_DATE" 4 &&
test_commit 5 &&
test_commit --date "$UNIX_EPOCH_ZERO" 6 &&
git branch left &&
git reset --hard 3 &&
test_commit 7 &&
test_commit --date "$FUTURE_DATE" 8 &&
test_commit 9 &&
git branch right &&
git reset --hard 3 &&
test_merge M left right &&
git commit-graph write --reachable &&
graph_read_expect 10 "generation_data generation_data_overflow" &&
git commit-graph verify
)
commit-graph: implement generation data chunk As discovered by Ævar, we cannot increment graph version to distinguish between generation numbers v1 and v2 [1]. Thus, one of pre-requistes before implementing generation number v2 was to distinguish between graph versions in a backwards compatible manner. We are going to introduce a new chunk called Generation DATa chunk (or GDAT). GDAT will store corrected committer date offsets whereas CDAT will still store topological level. Old Git does not understand GDAT chunk and would ignore it, reading topological levels from CDAT. New Git can parse GDAT and take advantage of newer generation numbers, falling back to topological levels when GDAT chunk is missing (as it would happen with a commit-graph written by old Git). We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT' which forces commit-graph file to be written without generation data chunk to emulate a commit-graph file written by old Git. To minimize the space required to store corrrected commit date, Git stores corrected commit date offsets into the commit-graph file, instea of corrected commit dates. This saves us 4 bytes per commit, decreasing the GDAT chunk size by half, but it's possible for the offset to overflow the 4-bytes allocated for storage. As such overflows are and should be exceedingly rare, we use the following overflow management scheme: We introduce a new commit-graph chunk, Generation Data OVerflow ('GDOV') to store corrected commit dates for commits with offsets greater than GENERATION_NUMBER_V2_OFFSET_MAX. If the offset is greater than GENERATION_NUMBER_V2_OFFSET_MAX, we set the MSB of the offset and the other bits store the position of corrected commit date in GDOV chunk, similar to how Extra Edge List is maintained. We test the overflow-related code with the following repo history: F - N - U / \ U - N - U N \ / N - F - N Where the commits denoted by U have committer date of zero seconds since Unix epoch, the commits denoted by N have committer date of 1112354055 (default committer date for the test suite) seconds since Unix epoch and the commits denoted by F have committer date of (2 ^ 31 - 2) seconds since Unix epoch. The largest offset observed is 2 ^ 31, just large enough to overflow. [1]: https://lore.kernel.org/git/87a7gdspo4.fsf@evledraar.gmail.com/ Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-16 18:11:15 +00:00
'
graph_git_behavior 'generation data overflow chunk repo' repo left right
commit-graph: fix corrupt upgrade from generation v1 to v2 The previous commit demonstrates a bug where a commit-graph using generation v2 could enter a state where one of the GDA2 values has its most-significant bit set (indicating that its value should be read from the extended offset table in the GDO2 chunk) without having a GDO2 chunk to read from. This results in the following error message being displayed to the caller: fatal: commit-graph requires overflow generation data but has none This bug arises in the following scenario: - We decide to write a commit-graph using generation number v2, and decide (correctly) that no GDO2 chunk is necessary (e.g., because all of the commiter date offsets are no larger than 2^31-1). - The v2 generation numbers are stored in the `->generation` member of the commit slab holding `struct commit_graph_data`'s. - Later on, `load_commit_graph_info()` is called, overwriting the v2 generation data in the aforementioned slab with any existing v1 generation data. Then, when the commit-graph code goes to write the GDA2 chunk via `write_graph_chunk_generation_data()`, we use the overwritten generation v1 data in a place where we expect to use a v2 generation number: offset = commit_graph_data_at(c)->generation - c->date; ...because `commit_graph_data_at(c)->generation` used to hold the v2 generation data, but it was overwritten to contain the v1 generation number via `load_commit_graph_info()`. If the `offset` computation above overflows the v2 generation number max, then `write_graph_chunk_generation_data()` will update its count of large offsets and write the marker accordingly: if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) { offset = CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW | num_generation_data_overflows; num_generation_data_overflows++; } and reads will look for the GDO2 chunk containing the overflowing v2 generation number, *after* the commit-graph code decided that no such chunk was necessary. The main problem is that the slab containing `struct commit_graph_data` has a dual purpose. It is used to hold data that we are about to write to disk while generating a commit-graph, as well as hold data that was read from an existing commit-graph. When the two mix, namely when the result of reading the commit-graph has a side-effect that mixes poorly with an in-progress commit-graph write, we end up with corrupt data. A complete fix might be to introduce a new slab that is used exclusively for writing, and gate access between the two slabs based on context provided by the caller (e.g., whether this computation is part of a "read" or "write" operation). But a more minimal fix addresses the only known path which overwrites the slab data, which is `compute_bloom_filters()` -> `get_or_compute_bloom_filter()` -> `load_commit_graph_info()` -> `fill_commit_graph_info()` by avoiding the last call which clobbers the data altogether. This path only needs to learn the graph position of a given commit so that it can be used in `load_bloom_filter_from_graph()`. By replacing the last steps of the above with one that records the graph position into a temporary variable which is then used to load the existing Bloom data, we eliminate the clobbering, removing the corruption. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-12 23:10:33 +00:00
test_expect_success 'overflow during generation version upgrade' '
git init overflow-v2-upgrade &&
(
cd overflow-v2-upgrade &&
# This commit will have a date at two seconds past the Epoch,
# and a (v1) generation number of 1, since it is a root commit.
#
# The offset will then be computed as 1-2, which will underflow
# to 2^31, which is greater than the v2 offset small limit of
# 2^31-1.
#
# This is sufficient to need a large offset table for the v2
# generation numbers.
test_commit --date "@2 +0000" base &&
git repack -d &&
# Test that upgrading from generation v1 to v2 correctly
# produces the overflow table.
git -c commitGraph.generationVersion=1 commit-graph write &&
git -c commitGraph.generationVersion=2 commit-graph write \
--changed-paths &&
git rev-list --all
)
'
corrupt_chunk () {
graph=full/.git/objects/info/commit-graph &&
test_when_finished "rm -rf $graph" &&
git -C full commit-graph write --reachable &&
corrupt_chunk_file $graph "$@"
}
check_corrupt_chunk () {
corrupt_chunk "$@" &&
git -C full -c core.commitGraph=false log >expect.out &&
git -C full -c core.commitGraph=true log >out 2>err &&
test_cmp expect.out out
}
test_expect_success 'reader notices too-small oid fanout chunk' '
# make it big enough that the graph file is plausible,
# otherwise we hit an earlier check
check_corrupt_chunk OIDF clear $(printf "000000%02x" $(test_seq 250)) &&
cat >expect.err <<-\EOF &&
error: commit-graph oid fanout chunk is wrong size
error: commit-graph required OID fanout chunk missing or corrupted
EOF
test_cmp expect.err err
'
commit-graph: check consistency of fanout table We use bsearch_hash() to look up items in the oid index of a commit-graph. It also has a fanout table to reduce the initial range in which we'll search. But since the fanout comes from the on-disk file, a corrupted or malicious file can cause us to look outside of the allocated index memory. One solution here would be to pass the total table size to bsearch_hash(), which could then bounds check the values it reads from the fanout. But there's an inexpensive up-front check we can do, and it's the same one used by the midx and pack idx code (both of which likewise have fanout tables and use bsearch_hash(), but are not affected by this bug): 1. We can check the value of the final fanout entry against the size of the table we got from the index chunk. These must always match, since the fanout is just slicing up the index. As a side note, the midx and pack idx code compute it the other way around: they use the final fanout value as the object count, and check the index size against it. Either is valid; if they disagree we cannot know which is wrong (a corrupted fanout value, or a too-small table of oids). 2. We can quickly scan the fanout table to make sure it is monotonically increasing. If it is, then we know that every value is less than or equal to the final value, and therefore less than or equal to the table size. It would also be sufficient to just check that each fanout value is smaller than the final one, but the midx and pack idx code both do a full monotonicity check. It's the same cost, and it catches some other corruptions (though not all; the checks done by "commit-graph verify" are more complete but more expensive, and our goal here is to be fast and memory-safe). There are two new tests. One just checks the final fanout value (this is the mirror image of the "too small oid lookup" case added for the midx in the previous commit; it's flipped here because commit-graph considers the oid lookup chunk to be the source of truth). The other actually creates a fanout with many out-of-bounds entries, and prior to this patch, it does cause the segfault you'd expect. But note that the error is not "your fanout entry is out-of-bounds", but rather "fanout value out of order". That's because we leave the final fanout value in place (to get past the table size check), making the index non-monotonic (the second-to-last entry is big, but the last one must remain small to match the actual table). We need adjustments to a few existing tests, as well: - an earlier test in t5318 corrupts the fanout and runs "commit-graph verify". Its message is now changed, since we catch the problem earlier (during the load step, rather than the careful validation step). - in t5324, we test that "commit-graph verify --shallow" does not do expensive verification on the base file of the chain. But the corruption it uses (munging a byte at offset 1000) happens to be in the middle of the fanout table. And now we detect that problem in the cheaper checks that are performed for every part of the graph. We'll push this back to offset 1500, which is only caught by the more expensive checksum validation. Likewise, there's a later test in t5324 which munges an offset 100 bytes into a file (also in the fanout table) that is referenced by an alternates file. So we now find that corruption during the load step, rather than the verification step. At the very least we need to change the error message (like the case above in t5318). But it is probably good to make sure we handle all parts of the verification even for alternate graph files. So let's likewise corrupt byte 1500 and make sure we found the invalid checksum. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09 21:04:58 +00:00
test_expect_success 'reader notices fanout/lookup table mismatch' '
check_corrupt_chunk OIDF 1020 "FFFFFFFF" &&
cat >expect.err <<-\EOF &&
commit-graph: use fanout value for graph size Commit-graph, midx, and pack idx files all have both a lookup table of oids and an oid fanout table. In midx and pack idx files, we take the final entry of the fanout table as the source of truth for the number of entries, and then verify that the size of the lookup table matches that. But for commit-graph files, we do the opposite: we use the size of the lookup table as the source of truth, and then check the final fanout entry against it. As noted in 4169d89645 (commit-graph: check consistency of fanout table, 2023-10-09), either is correct. But there are a few reasons to prefer the fanout table as the source of truth: 1. The fanout entries are 32-bits on disk, and that defines the maximum number of entries we can store. But since the size of the lookup table is only bounded by the filesystem, it can be much larger. And hence computing it as the commit-graph does means that we may truncate the result when storing it in a uint32_t. 2. We read the fanout first, then the lookup table. If we're verifying the chunks as we read them, then we'd want to take the fanout as truth (we have nothing yet to check it against) and then we can check that the lookup table matches what we already know. 3. It is pointlessly inconsistent with the midx and pack idx code. Since the three have to do similar size and bounds checks, it is easier to reason about all three if they use the same approach. So this patch moves the assignment of g->num_commits to the fanout parser, and then we can check the size of the lookup chunk as soon as we try to load it. There's already a test covering this situation, which munges the final fanout entry to 2^32-1. In the current code we complain that it does not agree with the table size. But now that we treat the munged value as the source of truth, we'll complain that the lookup table is the wrong size (again, either is correct). So we'll have to update the message we expect (and likewise for an earlier test which does similar munging). There's a similar test for this situation on the midx side, but rather than making a very-large fanout value, it just truncates the lookup table. We could do that here, too, but the very-large fanout value actually shows an interesting corner case. On a 32-bit system, multiplying to find the expected table size would cause an integer overflow. Using st_mult() would detect that, but cause us to die() rather than falling back to the non-graph code path. Checking the size using division (as we do with existing chunk-size checks) avoids the overflow entirely, and the test demonstrates this when run on a 32-bit system. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-09 07:24:35 +00:00
error: commit-graph OID lookup chunk is the wrong size
error: commit-graph required OID lookup chunk missing or corrupted
commit-graph: check consistency of fanout table We use bsearch_hash() to look up items in the oid index of a commit-graph. It also has a fanout table to reduce the initial range in which we'll search. But since the fanout comes from the on-disk file, a corrupted or malicious file can cause us to look outside of the allocated index memory. One solution here would be to pass the total table size to bsearch_hash(), which could then bounds check the values it reads from the fanout. But there's an inexpensive up-front check we can do, and it's the same one used by the midx and pack idx code (both of which likewise have fanout tables and use bsearch_hash(), but are not affected by this bug): 1. We can check the value of the final fanout entry against the size of the table we got from the index chunk. These must always match, since the fanout is just slicing up the index. As a side note, the midx and pack idx code compute it the other way around: they use the final fanout value as the object count, and check the index size against it. Either is valid; if they disagree we cannot know which is wrong (a corrupted fanout value, or a too-small table of oids). 2. We can quickly scan the fanout table to make sure it is monotonically increasing. If it is, then we know that every value is less than or equal to the final value, and therefore less than or equal to the table size. It would also be sufficient to just check that each fanout value is smaller than the final one, but the midx and pack idx code both do a full monotonicity check. It's the same cost, and it catches some other corruptions (though not all; the checks done by "commit-graph verify" are more complete but more expensive, and our goal here is to be fast and memory-safe). There are two new tests. One just checks the final fanout value (this is the mirror image of the "too small oid lookup" case added for the midx in the previous commit; it's flipped here because commit-graph considers the oid lookup chunk to be the source of truth). The other actually creates a fanout with many out-of-bounds entries, and prior to this patch, it does cause the segfault you'd expect. But note that the error is not "your fanout entry is out-of-bounds", but rather "fanout value out of order". That's because we leave the final fanout value in place (to get past the table size check), making the index non-monotonic (the second-to-last entry is big, but the last one must remain small to match the actual table). We need adjustments to a few existing tests, as well: - an earlier test in t5318 corrupts the fanout and runs "commit-graph verify". Its message is now changed, since we catch the problem earlier (during the load step, rather than the careful validation step). - in t5324, we test that "commit-graph verify --shallow" does not do expensive verification on the base file of the chain. But the corruption it uses (munging a byte at offset 1000) happens to be in the middle of the fanout table. And now we detect that problem in the cheaper checks that are performed for every part of the graph. We'll push this back to offset 1500, which is only caught by the more expensive checksum validation. Likewise, there's a later test in t5324 which munges an offset 100 bytes into a file (also in the fanout table) that is referenced by an alternates file. So we now find that corruption during the load step, rather than the verification step. At the very least we need to change the error message (like the case above in t5318). But it is probably good to make sure we handle all parts of the verification even for alternate graph files. So let's likewise corrupt byte 1500 and make sure we found the invalid checksum. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09 21:04:58 +00:00
EOF
test_cmp expect.err err
'
test_expect_success 'reader notices out-of-bounds fanout' '
# Rather than try to corrupt a specific hash, we will just
# wreck them all. But we cannot just set them all to 0xFFFFFFFF or
# similar, as they are used for hi/lo starts in a binary search (so if
# they are identical, that indicates that the search should abort
# immediately). Instead, we will give them high values that differ by
# 2^24, ensuring that any that are used would cause an out-of-bounds
# read.
check_corrupt_chunk OIDF 0 $(printf "%02x000000" $(test_seq 0 254)) &&
cat >expect.err <<-\EOF &&
error: commit-graph fanout values out of order
error: commit-graph required OID fanout chunk missing or corrupted
commit-graph: check consistency of fanout table We use bsearch_hash() to look up items in the oid index of a commit-graph. It also has a fanout table to reduce the initial range in which we'll search. But since the fanout comes from the on-disk file, a corrupted or malicious file can cause us to look outside of the allocated index memory. One solution here would be to pass the total table size to bsearch_hash(), which could then bounds check the values it reads from the fanout. But there's an inexpensive up-front check we can do, and it's the same one used by the midx and pack idx code (both of which likewise have fanout tables and use bsearch_hash(), but are not affected by this bug): 1. We can check the value of the final fanout entry against the size of the table we got from the index chunk. These must always match, since the fanout is just slicing up the index. As a side note, the midx and pack idx code compute it the other way around: they use the final fanout value as the object count, and check the index size against it. Either is valid; if they disagree we cannot know which is wrong (a corrupted fanout value, or a too-small table of oids). 2. We can quickly scan the fanout table to make sure it is monotonically increasing. If it is, then we know that every value is less than or equal to the final value, and therefore less than or equal to the table size. It would also be sufficient to just check that each fanout value is smaller than the final one, but the midx and pack idx code both do a full monotonicity check. It's the same cost, and it catches some other corruptions (though not all; the checks done by "commit-graph verify" are more complete but more expensive, and our goal here is to be fast and memory-safe). There are two new tests. One just checks the final fanout value (this is the mirror image of the "too small oid lookup" case added for the midx in the previous commit; it's flipped here because commit-graph considers the oid lookup chunk to be the source of truth). The other actually creates a fanout with many out-of-bounds entries, and prior to this patch, it does cause the segfault you'd expect. But note that the error is not "your fanout entry is out-of-bounds", but rather "fanout value out of order". That's because we leave the final fanout value in place (to get past the table size check), making the index non-monotonic (the second-to-last entry is big, but the last one must remain small to match the actual table). We need adjustments to a few existing tests, as well: - an earlier test in t5318 corrupts the fanout and runs "commit-graph verify". Its message is now changed, since we catch the problem earlier (during the load step, rather than the careful validation step). - in t5324, we test that "commit-graph verify --shallow" does not do expensive verification on the base file of the chain. But the corruption it uses (munging a byte at offset 1000) happens to be in the middle of the fanout table. And now we detect that problem in the cheaper checks that are performed for every part of the graph. We'll push this back to offset 1500, which is only caught by the more expensive checksum validation. Likewise, there's a later test in t5324 which munges an offset 100 bytes into a file (also in the fanout table) that is referenced by an alternates file. So we now find that corruption during the load step, rather than the verification step. At the very least we need to change the error message (like the case above in t5318). But it is probably good to make sure we handle all parts of the verification even for alternate graph files. So let's likewise corrupt byte 1500 and make sure we found the invalid checksum. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09 21:04:58 +00:00
EOF
test_cmp expect.err err
'
test_expect_success 'reader notices too-small commit data chunk' '
check_corrupt_chunk CDAT clear 00000000 &&
cat >expect.err <<-\EOF &&
error: commit-graph commit data chunk is wrong size
error: commit-graph required commit data chunk missing or corrupted
EOF
test_cmp expect.err err
'
commit-graph: detect out-of-bounds extra-edges pointers If an entry in a commit-graph file has more than 2 parents, the fixed-size parent fields instead point to an offset within an "extra edges" chunk. We blindly follow these, assuming that the chunk is present and sufficiently large; this can lead to an out-of-bounds read for a corrupt or malicious file. We can fix this by recording the size of the chunk and adding a bounds-check in fill_commit_in_graph(). There are a few tricky bits: 1. We'll switch from working with a pointer to an offset. This makes some corner cases just fall out naturally: a. If we did not find an EDGE chunk at all, our size will correctly be zero (so everything is "out of bounds"). b. Comparing "size / 4" lets us make sure we have at least 4 bytes to read, and we never compute a pointer more than one element past the end of the array (computing a larger pointer is probably OK in practice, but is technically undefined behavior). c. The current code casts to "uint32_t *". Replacing it with an offset avoids any comparison between different types of pointer (since the chunk is stored as "unsigned char *"). 2. This is the first case in which fill_commit_in_graph() may return anything but success. We need to make sure to roll back the "parsed" flag (and any parents we might have added before running out of buffer) so that the caller can cleanly fall back to loading the commit object itself. It's a little non-trivial to do this, and we might benefit from factoring it out. But we can wait on that until we actually see a second case where we return an error. As a bonus, this lets us drop the st_mult() call. Since we've already done a bounds check, we know there won't be any integer overflow (it would imply our buffer is larger than a size_t can hold). The included test does not actually segfault before this patch (though you could construct a case where it does). Instead, it reads garbage from the next chunk which results in it complaining about a bogus parent id. This is sufficient for our needs, though (we care that the fallback succeeds, and that stderr mentions the out-of-bounds read). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09 21:05:38 +00:00
test_expect_success 'reader notices out-of-bounds extra edge' '
check_corrupt_chunk EDGE clear &&
cat >expect.err <<-\EOF &&
error: commit-graph extra-edges pointer out of bounds
EOF
test_cmp expect.err err
'
test_expect_success 'reader notices too-small generations chunk' '
check_corrupt_chunk GDA2 clear 00000000 &&
cat >expect.err <<-\EOF &&
error: commit-graph generations chunk is wrong size
EOF
test_cmp expect.err err
'
commit-graph: introduce envvar to disable commit existence checks Our `lookup_commit_in_graph()` helper tries to look up commits from the commit graph and, if it doesn't exist there, falls back to parsing it from the object database instead. This is intended to speed up the lookup of any such commit that exists in the database. There is an edge case though where the commit exists in the graph, but not in the object database. To avoid returning such stale commits the helper function thus double checks that any such commit parsed from the graph also exists in the object database. This makes the function safe to use even when commit graphs aren't updated regularly. We're about to introduce the same pattern into other parts of our code base though, namely `repo_parse_commit_internal()`. Here the extra sanity check is a bit of a tougher sell: `lookup_commit_in_graph()` was a newly introduced helper, and as such there was no performance hit by adding this sanity check. If we added `repo_parse_commit_internal()` with that sanity check right from the beginning as well, this would probably never have been an issue to begin with. But by retrofitting it with this sanity check now we do add a performance regression to preexisting code, and thus there is a desire to avoid this or at least give an escape hatch. In practice, there is no inherent reason why either of those functions should have the sanity check whereas the other one does not: either both of them are able to detect this issue or none of them should be. This also means that the default of whether we do the check should likely be the same for both. To err on the side of caution, we thus rather want to make `repo_parse_commit_internal()` stricter than to loosen the checks that we already have in `lookup_commit_in_graph()`. The escape hatch is added in the form of a new GIT_COMMIT_GRAPH_PARANOIA environment variable that mirrors GIT_REF_PARANOIA. If enabled, which is the default, we will double check that commits looked up in the commit graph via `lookup_commit_in_graph()` also exist in the object database. This same check will also be added in `repo_parse_commit_internal()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-31 07:16:13 +00:00
test_expect_success 'stale commit cannot be parsed when given directly' '
test_when_finished "rm -rf repo" &&
git init repo &&
(
cd repo &&
test_commit A &&
test_commit B &&
git commit-graph write --reachable &&
oid=$(git rev-parse B) &&
rm .git/objects/"$(test_oid_to_path "$oid")" &&
# Verify that it is possible to read the commit from the
# commit graph when not being paranoid, ...
commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default In 7a5d604443 (commit: detect commits that exist in commit-graph but not in the ODB, 2023-10-31), we have introduced a new object existence check into `repo_parse_commit_internal()` so that we do not parse commits via the commit-graph that don't have a corresponding object in the object database. This new check of course comes with a performance penalty, which the commit put at around 30% for `git rev-list --topo-order`. But there are in fact scenarios where the performance regression is even higher. The following benchmark against linux.git with a fully-build commit-graph: Benchmark 1: git.v2.42.1 rev-list --count HEAD Time (mean ± σ): 658.0 ms ± 5.2 ms [User: 613.5 ms, System: 44.4 ms] Range (min … max): 650.2 ms … 666.0 ms 10 runs Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD Time (mean ± σ): 1.333 s ± 0.019 s [User: 1.263 s, System: 0.069 s] Range (min … max): 1.302 s … 1.361 s 10 runs Summary git.v2.42.1 rev-list --count HEAD ran 2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD While it's a noble goal to ensure that results are the same regardless of whether or not we have a potentially stale commit-graph, taking twice as much time is a tough sell. Furthermore, we can generally assume that the commit-graph will be updated by git-gc(1) or git-maintenance(1) as required so that the case where the commit-graph is stale should not at all be common. With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore the behaviour and thus performance previous to the mentioned commit. In order to not be inconsistent, also disable this behaviour by default in `lookup_commit_in_graph()`, where the object existence check has been introduced right at its inception via f559d6d45e (revision: avoid hitting packfiles when commits are in commit-graph, 2021-08-09). This results in another speedup in commands that end up calling this function, even though it's less pronounced compared to the above benchmark. The following has been executed in linux.git with ~1.2 million references: Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.947 s ± 0.003 s [User: 2.412 s, System: 0.534 s] Range (min … max): 2.943 s … 2.949 s 3 runs Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.724 s ± 0.030 s [User: 2.207 s, System: 0.514 s] Range (min … max): 2.704 s … 2.759 s 3 runs Summary GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran 1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted So whereas 7a5d604443 initially introduced the logic to start doing an object existence check in `repo_parse_commit_internal()` by default, the updated logic will now instead cause `lookup_commit_in_graph()` to stop doing the check by default. This behaviour continues to be tweakable by the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable. Note that this requires us to amend some tests to manually turn on the paranoid checks again. This is because we cause repository corruption by manually deleting objects which are part of the commit graph already. These circumstances shouldn't usually happen in repositories. Reported-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-24 11:08:21 +00:00
git rev-list B &&
commit-graph: introduce envvar to disable commit existence checks Our `lookup_commit_in_graph()` helper tries to look up commits from the commit graph and, if it doesn't exist there, falls back to parsing it from the object database instead. This is intended to speed up the lookup of any such commit that exists in the database. There is an edge case though where the commit exists in the graph, but not in the object database. To avoid returning such stale commits the helper function thus double checks that any such commit parsed from the graph also exists in the object database. This makes the function safe to use even when commit graphs aren't updated regularly. We're about to introduce the same pattern into other parts of our code base though, namely `repo_parse_commit_internal()`. Here the extra sanity check is a bit of a tougher sell: `lookup_commit_in_graph()` was a newly introduced helper, and as such there was no performance hit by adding this sanity check. If we added `repo_parse_commit_internal()` with that sanity check right from the beginning as well, this would probably never have been an issue to begin with. But by retrofitting it with this sanity check now we do add a performance regression to preexisting code, and thus there is a desire to avoid this or at least give an escape hatch. In practice, there is no inherent reason why either of those functions should have the sanity check whereas the other one does not: either both of them are able to detect this issue or none of them should be. This also means that the default of whether we do the check should likely be the same for both. To err on the side of caution, we thus rather want to make `repo_parse_commit_internal()` stricter than to loosen the checks that we already have in `lookup_commit_in_graph()`. The escape hatch is added in the form of a new GIT_COMMIT_GRAPH_PARANOIA environment variable that mirrors GIT_REF_PARANOIA. If enabled, which is the default, we will double check that commits looked up in the commit graph via `lookup_commit_in_graph()` also exist in the object database. This same check will also be added in `repo_parse_commit_internal()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-31 07:16:13 +00:00
# ... but parsing the commit when double checking that
# it actually exists in the object database should fail.
commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default In 7a5d604443 (commit: detect commits that exist in commit-graph but not in the ODB, 2023-10-31), we have introduced a new object existence check into `repo_parse_commit_internal()` so that we do not parse commits via the commit-graph that don't have a corresponding object in the object database. This new check of course comes with a performance penalty, which the commit put at around 30% for `git rev-list --topo-order`. But there are in fact scenarios where the performance regression is even higher. The following benchmark against linux.git with a fully-build commit-graph: Benchmark 1: git.v2.42.1 rev-list --count HEAD Time (mean ± σ): 658.0 ms ± 5.2 ms [User: 613.5 ms, System: 44.4 ms] Range (min … max): 650.2 ms … 666.0 ms 10 runs Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD Time (mean ± σ): 1.333 s ± 0.019 s [User: 1.263 s, System: 0.069 s] Range (min … max): 1.302 s … 1.361 s 10 runs Summary git.v2.42.1 rev-list --count HEAD ran 2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD While it's a noble goal to ensure that results are the same regardless of whether or not we have a potentially stale commit-graph, taking twice as much time is a tough sell. Furthermore, we can generally assume that the commit-graph will be updated by git-gc(1) or git-maintenance(1) as required so that the case where the commit-graph is stale should not at all be common. With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore the behaviour and thus performance previous to the mentioned commit. In order to not be inconsistent, also disable this behaviour by default in `lookup_commit_in_graph()`, where the object existence check has been introduced right at its inception via f559d6d45e (revision: avoid hitting packfiles when commits are in commit-graph, 2021-08-09). This results in another speedup in commands that end up calling this function, even though it's less pronounced compared to the above benchmark. The following has been executed in linux.git with ~1.2 million references: Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.947 s ± 0.003 s [User: 2.412 s, System: 0.534 s] Range (min … max): 2.943 s … 2.949 s 3 runs Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.724 s ± 0.030 s [User: 2.207 s, System: 0.514 s] Range (min … max): 2.704 s … 2.759 s 3 runs Summary GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran 1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted So whereas 7a5d604443 initially introduced the logic to start doing an object existence check in `repo_parse_commit_internal()` by default, the updated logic will now instead cause `lookup_commit_in_graph()` to stop doing the check by default. This behaviour continues to be tweakable by the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable. Note that this requires us to amend some tests to manually turn on the paranoid checks again. This is because we cause repository corruption by manually deleting objects which are part of the commit graph already. These circumstances shouldn't usually happen in repositories. Reported-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-24 11:08:21 +00:00
test_must_fail env GIT_COMMIT_GRAPH_PARANOIA=true git rev-list -1 B
commit-graph: introduce envvar to disable commit existence checks Our `lookup_commit_in_graph()` helper tries to look up commits from the commit graph and, if it doesn't exist there, falls back to parsing it from the object database instead. This is intended to speed up the lookup of any such commit that exists in the database. There is an edge case though where the commit exists in the graph, but not in the object database. To avoid returning such stale commits the helper function thus double checks that any such commit parsed from the graph also exists in the object database. This makes the function safe to use even when commit graphs aren't updated regularly. We're about to introduce the same pattern into other parts of our code base though, namely `repo_parse_commit_internal()`. Here the extra sanity check is a bit of a tougher sell: `lookup_commit_in_graph()` was a newly introduced helper, and as such there was no performance hit by adding this sanity check. If we added `repo_parse_commit_internal()` with that sanity check right from the beginning as well, this would probably never have been an issue to begin with. But by retrofitting it with this sanity check now we do add a performance regression to preexisting code, and thus there is a desire to avoid this or at least give an escape hatch. In practice, there is no inherent reason why either of those functions should have the sanity check whereas the other one does not: either both of them are able to detect this issue or none of them should be. This also means that the default of whether we do the check should likely be the same for both. To err on the side of caution, we thus rather want to make `repo_parse_commit_internal()` stricter than to loosen the checks that we already have in `lookup_commit_in_graph()`. The escape hatch is added in the form of a new GIT_COMMIT_GRAPH_PARANOIA environment variable that mirrors GIT_REF_PARANOIA. If enabled, which is the default, we will double check that commits looked up in the commit graph via `lookup_commit_in_graph()` also exist in the object database. This same check will also be added in `repo_parse_commit_internal()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-31 07:16:13 +00:00
)
'
commit: detect commits that exist in commit-graph but not in the ODB Commit graphs can become stale and contain references to commits that do not exist in the object database anymore. Theoretically, this can lead to a scenario where we are able to successfully look up any such commit via the commit graph even though such a lookup would fail if done via the object database directly. As the commit graph is mostly intended as a sort of cache to speed up parsing of commits we do not want to have diverging behaviour in a repository with and a repository without commit graphs, no matter whether they are stale or not. As commits are otherwise immutable, the only thing that we really need to care about is thus the presence or absence of a commit. To address potentially stale commit data that may exist in the graph, our `lookup_commit_in_graph()` function will check for the commit's existence in both the commit graph, but also in the object database. So even if we were able to look up the commit's data in the graph, we would still pretend as if the commit didn't exist if it is missing in the object database. We don't have the same safety net in `parse_commit_in_graph_one()` though. This function is mostly used internally in "commit-graph.c" itself to validate the commit graph, and this usage is fine. We do expose its functionality via `parse_commit_in_graph()` though, which gets called by `repo_parse_commit_internal()`, and that function is in turn used in many places in our codebase. For all I can see this function is never used to directly turn an object ID into a commit object without additional safety checks before or after this lookup. What it is being used for though is to walk history via the parent chain of commits. So when commits in the parent chain of a graph walk are missing it is possible that we wouldn't notice if that missing commit was part of the commit graph. Thus, a query like `git rev-parse HEAD~2` can succeed even if the intermittent commit is missing. It's unclear whether there are additional ways in which such stale commit graphs can lead to problems. In any case, it feels like this is a bigger bug waiting to happen when we gain additional direct or indirect callers of `repo_parse_commit_internal()`. So let's fix the inconsistent behaviour by checking for object existence via the object database, as well. This check of course comes with a performance penalty. The following benchmarks have been executed in a clone of linux.git with stable tags added: Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master) Time (mean ± σ): 2.913 s ± 0.018 s [User: 2.363 s, System: 0.548 s] Range (min … max): 2.894 s … 2.950 s 10 runs Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency) Time (mean ± σ): 3.834 s ± 0.052 s [User: 3.276 s, System: 0.556 s] Range (min … max): 3.780 s … 3.961 s 10 runs Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master) Time (mean ± σ): 13.841 s ± 0.084 s [User: 13.152 s, System: 0.687 s] Range (min … max): 13.714 s … 13.995 s 10 runs Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency) Time (mean ± σ): 13.762 s ± 0.116 s [User: 13.094 s, System: 0.667 s] Range (min … max): 13.645 s … 14.038 s 10 runs Summary git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran 1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency) 4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency) 4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master) We look at a ~30% regression in general, but in general we're still a whole lot faster than without the commit graph. To counteract this, the new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-31 07:16:18 +00:00
test_expect_success 'stale commit cannot be parsed when traversing graph' '
test_when_finished "rm -rf repo" &&
git init repo &&
(
cd repo &&
test_commit A &&
test_commit B &&
test_commit C &&
git commit-graph write --reachable &&
# Corrupt the repository by deleting the intermediate commit
# object. Commands should notice that this object is absent and
# thus that the repository is corrupt even if the commit graph
# exists.
oid=$(git rev-parse B) &&
rm .git/objects/"$(test_oid_to_path "$oid")" &&
# Again, we should be able to parse the commit when not
# being paranoid about commit graph staleness...
commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default In 7a5d604443 (commit: detect commits that exist in commit-graph but not in the ODB, 2023-10-31), we have introduced a new object existence check into `repo_parse_commit_internal()` so that we do not parse commits via the commit-graph that don't have a corresponding object in the object database. This new check of course comes with a performance penalty, which the commit put at around 30% for `git rev-list --topo-order`. But there are in fact scenarios where the performance regression is even higher. The following benchmark against linux.git with a fully-build commit-graph: Benchmark 1: git.v2.42.1 rev-list --count HEAD Time (mean ± σ): 658.0 ms ± 5.2 ms [User: 613.5 ms, System: 44.4 ms] Range (min … max): 650.2 ms … 666.0 ms 10 runs Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD Time (mean ± σ): 1.333 s ± 0.019 s [User: 1.263 s, System: 0.069 s] Range (min … max): 1.302 s … 1.361 s 10 runs Summary git.v2.42.1 rev-list --count HEAD ran 2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD While it's a noble goal to ensure that results are the same regardless of whether or not we have a potentially stale commit-graph, taking twice as much time is a tough sell. Furthermore, we can generally assume that the commit-graph will be updated by git-gc(1) or git-maintenance(1) as required so that the case where the commit-graph is stale should not at all be common. With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore the behaviour and thus performance previous to the mentioned commit. In order to not be inconsistent, also disable this behaviour by default in `lookup_commit_in_graph()`, where the object existence check has been introduced right at its inception via f559d6d45e (revision: avoid hitting packfiles when commits are in commit-graph, 2021-08-09). This results in another speedup in commands that end up calling this function, even though it's less pronounced compared to the above benchmark. The following has been executed in linux.git with ~1.2 million references: Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.947 s ± 0.003 s [User: 2.412 s, System: 0.534 s] Range (min … max): 2.943 s … 2.949 s 3 runs Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.724 s ± 0.030 s [User: 2.207 s, System: 0.514 s] Range (min … max): 2.704 s … 2.759 s 3 runs Summary GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran 1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted So whereas 7a5d604443 initially introduced the logic to start doing an object existence check in `repo_parse_commit_internal()` by default, the updated logic will now instead cause `lookup_commit_in_graph()` to stop doing the check by default. This behaviour continues to be tweakable by the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable. Note that this requires us to amend some tests to manually turn on the paranoid checks again. This is because we cause repository corruption by manually deleting objects which are part of the commit graph already. These circumstances shouldn't usually happen in repositories. Reported-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-24 11:08:21 +00:00
git rev-parse HEAD~2 &&
commit: detect commits that exist in commit-graph but not in the ODB Commit graphs can become stale and contain references to commits that do not exist in the object database anymore. Theoretically, this can lead to a scenario where we are able to successfully look up any such commit via the commit graph even though such a lookup would fail if done via the object database directly. As the commit graph is mostly intended as a sort of cache to speed up parsing of commits we do not want to have diverging behaviour in a repository with and a repository without commit graphs, no matter whether they are stale or not. As commits are otherwise immutable, the only thing that we really need to care about is thus the presence or absence of a commit. To address potentially stale commit data that may exist in the graph, our `lookup_commit_in_graph()` function will check for the commit's existence in both the commit graph, but also in the object database. So even if we were able to look up the commit's data in the graph, we would still pretend as if the commit didn't exist if it is missing in the object database. We don't have the same safety net in `parse_commit_in_graph_one()` though. This function is mostly used internally in "commit-graph.c" itself to validate the commit graph, and this usage is fine. We do expose its functionality via `parse_commit_in_graph()` though, which gets called by `repo_parse_commit_internal()`, and that function is in turn used in many places in our codebase. For all I can see this function is never used to directly turn an object ID into a commit object without additional safety checks before or after this lookup. What it is being used for though is to walk history via the parent chain of commits. So when commits in the parent chain of a graph walk are missing it is possible that we wouldn't notice if that missing commit was part of the commit graph. Thus, a query like `git rev-parse HEAD~2` can succeed even if the intermittent commit is missing. It's unclear whether there are additional ways in which such stale commit graphs can lead to problems. In any case, it feels like this is a bigger bug waiting to happen when we gain additional direct or indirect callers of `repo_parse_commit_internal()`. So let's fix the inconsistent behaviour by checking for object existence via the object database, as well. This check of course comes with a performance penalty. The following benchmarks have been executed in a clone of linux.git with stable tags added: Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master) Time (mean ± σ): 2.913 s ± 0.018 s [User: 2.363 s, System: 0.548 s] Range (min … max): 2.894 s … 2.950 s 10 runs Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency) Time (mean ± σ): 3.834 s ± 0.052 s [User: 3.276 s, System: 0.556 s] Range (min … max): 3.780 s … 3.961 s 10 runs Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master) Time (mean ± σ): 13.841 s ± 0.084 s [User: 13.152 s, System: 0.687 s] Range (min … max): 13.714 s … 13.995 s 10 runs Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency) Time (mean ± σ): 13.762 s ± 0.116 s [User: 13.094 s, System: 0.667 s] Range (min … max): 13.645 s … 14.038 s 10 runs Summary git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran 1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency) 4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency) 4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master) We look at a ~30% regression in general, but in general we're still a whole lot faster than without the commit graph. To counteract this, the new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-31 07:16:18 +00:00
# ... but fail when we are paranoid.
commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default In 7a5d604443 (commit: detect commits that exist in commit-graph but not in the ODB, 2023-10-31), we have introduced a new object existence check into `repo_parse_commit_internal()` so that we do not parse commits via the commit-graph that don't have a corresponding object in the object database. This new check of course comes with a performance penalty, which the commit put at around 30% for `git rev-list --topo-order`. But there are in fact scenarios where the performance regression is even higher. The following benchmark against linux.git with a fully-build commit-graph: Benchmark 1: git.v2.42.1 rev-list --count HEAD Time (mean ± σ): 658.0 ms ± 5.2 ms [User: 613.5 ms, System: 44.4 ms] Range (min … max): 650.2 ms … 666.0 ms 10 runs Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD Time (mean ± σ): 1.333 s ± 0.019 s [User: 1.263 s, System: 0.069 s] Range (min … max): 1.302 s … 1.361 s 10 runs Summary git.v2.42.1 rev-list --count HEAD ran 2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD While it's a noble goal to ensure that results are the same regardless of whether or not we have a potentially stale commit-graph, taking twice as much time is a tough sell. Furthermore, we can generally assume that the commit-graph will be updated by git-gc(1) or git-maintenance(1) as required so that the case where the commit-graph is stale should not at all be common. With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore the behaviour and thus performance previous to the mentioned commit. In order to not be inconsistent, also disable this behaviour by default in `lookup_commit_in_graph()`, where the object existence check has been introduced right at its inception via f559d6d45e (revision: avoid hitting packfiles when commits are in commit-graph, 2021-08-09). This results in another speedup in commands that end up calling this function, even though it's less pronounced compared to the above benchmark. The following has been executed in linux.git with ~1.2 million references: Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.947 s ± 0.003 s [User: 2.412 s, System: 0.534 s] Range (min … max): 2.943 s … 2.949 s 3 runs Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.724 s ± 0.030 s [User: 2.207 s, System: 0.514 s] Range (min … max): 2.704 s … 2.759 s 3 runs Summary GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran 1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted So whereas 7a5d604443 initially introduced the logic to start doing an object existence check in `repo_parse_commit_internal()` by default, the updated logic will now instead cause `lookup_commit_in_graph()` to stop doing the check by default. This behaviour continues to be tweakable by the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable. Note that this requires us to amend some tests to manually turn on the paranoid checks again. This is because we cause repository corruption by manually deleting objects which are part of the commit graph already. These circumstances shouldn't usually happen in repositories. Reported-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-24 11:08:21 +00:00
test_must_fail env GIT_COMMIT_GRAPH_PARANOIA=true git rev-parse HEAD~2 2>error &&
commit: detect commits that exist in commit-graph but not in the ODB Commit graphs can become stale and contain references to commits that do not exist in the object database anymore. Theoretically, this can lead to a scenario where we are able to successfully look up any such commit via the commit graph even though such a lookup would fail if done via the object database directly. As the commit graph is mostly intended as a sort of cache to speed up parsing of commits we do not want to have diverging behaviour in a repository with and a repository without commit graphs, no matter whether they are stale or not. As commits are otherwise immutable, the only thing that we really need to care about is thus the presence or absence of a commit. To address potentially stale commit data that may exist in the graph, our `lookup_commit_in_graph()` function will check for the commit's existence in both the commit graph, but also in the object database. So even if we were able to look up the commit's data in the graph, we would still pretend as if the commit didn't exist if it is missing in the object database. We don't have the same safety net in `parse_commit_in_graph_one()` though. This function is mostly used internally in "commit-graph.c" itself to validate the commit graph, and this usage is fine. We do expose its functionality via `parse_commit_in_graph()` though, which gets called by `repo_parse_commit_internal()`, and that function is in turn used in many places in our codebase. For all I can see this function is never used to directly turn an object ID into a commit object without additional safety checks before or after this lookup. What it is being used for though is to walk history via the parent chain of commits. So when commits in the parent chain of a graph walk are missing it is possible that we wouldn't notice if that missing commit was part of the commit graph. Thus, a query like `git rev-parse HEAD~2` can succeed even if the intermittent commit is missing. It's unclear whether there are additional ways in which such stale commit graphs can lead to problems. In any case, it feels like this is a bigger bug waiting to happen when we gain additional direct or indirect callers of `repo_parse_commit_internal()`. So let's fix the inconsistent behaviour by checking for object existence via the object database, as well. This check of course comes with a performance penalty. The following benchmarks have been executed in a clone of linux.git with stable tags added: Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master) Time (mean ± σ): 2.913 s ± 0.018 s [User: 2.363 s, System: 0.548 s] Range (min … max): 2.894 s … 2.950 s 10 runs Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency) Time (mean ± σ): 3.834 s ± 0.052 s [User: 3.276 s, System: 0.556 s] Range (min … max): 3.780 s … 3.961 s 10 runs Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master) Time (mean ± σ): 13.841 s ± 0.084 s [User: 13.152 s, System: 0.687 s] Range (min … max): 13.714 s … 13.995 s 10 runs Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency) Time (mean ± σ): 13.762 s ± 0.116 s [User: 13.094 s, System: 0.667 s] Range (min … max): 13.645 s … 14.038 s 10 runs Summary git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran 1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency) 4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency) 4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master) We look at a ~30% regression in general, but in general we're still a whole lot faster than without the commit graph. To counteract this, the new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-31 07:16:18 +00:00
grep "error: commit $oid exists in commit-graph but not in the object database" error
)
'
test_done