git/t/t5318-commit-graph.sh

900 lines
29 KiB
Bash
Raw Normal View History

#!/bin/sh
test_description='commit graph'
. ./test-lib.sh
. "$TEST_DIRECTORY"/lib-chunk.sh
GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0
test_expect_success 'usage' '
test_expect_code 129 git commit-graph write blah 2>err &&
test_expect_code 129 git commit-graph write verify
'
test_expect_success 'usage shown without sub-command' '
test_expect_code 129 git commit-graph 2>err &&
grep usage: err
'
test_expect_success 'usage shown with an error on unknown sub-command' '
cat >expect <<-\EOF &&
error: unknown subcommand: `unknown'\''
EOF
test_expect_code 129 git commit-graph unknown 2>stderr &&
grep error stderr >actual &&
test_cmp expect actual
'
objdir=".git/objects"
test_expect_success 'setup full repo' '
git init full
'
commit-graph.c: write non-split graphs as read-only In the previous commit, Git learned 'hold_lock_file_for_update_mode' to allow the caller to specify the permission bits (prior to further adjustment by the umask and shared repository permissions) used when acquiring a temporary file. Use this in the commit-graph machinery for writing a non-split graph to acquire an opened temporary file with permissions read-only permissions to match the split behavior. (In the split case, Git uses git_mkstemp_mode' for each of the commit-graph layers with permission bits '0444'). One can notice this discrepancy when moving a non-split graph to be part of a new chain. This causes a commit-graph chain where all layers have read-only permission bits, except for the base layer, which is writable for the current user. Resolve this discrepancy by using the new 'hold_lock_file_for_update_mode' and passing the desired permission bits. Doing so causes some test fallout in t5318 and t6600. In t5318, this occurs in tests that corrupt a commit-graph file by writing into it. For these, 'chmod u+w'-ing the file beforehand resolves the issue. The additional spot in 'corrupt_graph_verify' is necessary because of the extra 'git commit-graph write' beforehand (which *does* rewrite the commit-graph file). In t6600, this is caused by copying a read-only commit-graph file into place and then trying to replace it. For these, make these files writable. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-29 17:36:38 +00:00
test_expect_success POSIXPERM 'tweak umask for modebit tests' '
umask 022
'
test_expect_success 'verify graph with no graph file' '
git -C full commit-graph verify
'
test_expect_success 'write graph with no packs' '
git -C full commit-graph write --object-dir $objdir &&
test_path_is_missing full/$objdir/info/commit-graph
'
commit-graph: error out on invalid commit oids in 'write --stdin-commits' While 'git commit-graph write --stdin-commits' expects commit object ids as input, it accepts and silently skips over any invalid commit object ids, and still exits with success: # nonsense $ echo not-a-commit-oid | git commit-graph write --stdin-commits $ echo $? 0 # sometimes I forgot that refs are not good... $ echo HEAD | git commit-graph write --stdin-commits $ echo $? 0 # valid tree OID, but not a commit OID $ git rev-parse HEAD^{tree} | git commit-graph write --stdin-commits $ echo $? 0 $ ls -l .git/objects/info/commit-graph ls: cannot access '.git/objects/info/commit-graph': No such file or directory Check that all input records are indeed valid commit object ids and return with error otherwise, the same way '--stdin-packs' handles invalid input; see e103f7276f (commit-graph: return with errors during write, 2019-06-12). Note that it should only return with error when encountering an invalid commit object id coming from standard input. However, '--reachable' uses the same code path to process object ids pointed to by all refs, and that includes tag object ids as well, which should still be skipped over. Therefore add a new flag to 'enum commit_graph_write_flags' and a corresponding field to 'struct write_commit_graph_context', so we can differentiate between those two cases. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-05 08:02:40 +00:00
test_expect_success 'exit with correct error on bad input to --stdin-packs' '
echo doesnotexist >in &&
test_expect_code 1 git -C full commit-graph write --stdin-packs \
<in 2>stderr &&
test_i18ngrep "error adding pack" stderr
'
test_expect_success 'create commits and repack' '
for i in $(test_seq 3)
do
test_commit -C full $i &&
git -C full branch commits/$i || return 1
done &&
git -C full repack
'
. "$TEST_DIRECTORY"/lib-commit-graph.sh
graph_git_behavior 'no graph' full commits/3 commits/1
test_expect_success 'exit with correct error on bad input to --stdin-commits' '
commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in 'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on receiving non-commit OIDs as input to '--stdin-commits'. This behavior can be cumbersome to work around in, say, the case of piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if the caller does not want to cull out non-commits themselves. In this situation, it would be ideal if 'git commit-graph write' wrote the graph containing the inputs that did pertain to commits, and silently ignored the remainder of the input. Some options have been proposed to the effect of '--[no-]check-oids' which would allow callers to have the commit-graph builtin do just that. After some discussion, it is difficult to imagine a caller who wouldn't want to pass '--no-check-oids', suggesting that we should get rid of the behavior of complaining about non-commit inputs altogether. If callers do wish to retain this behavior, they can easily work around this change by doing the following: git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' | awk ' !/commit/ { print "not-a-commit:"$1 } /commit/ { print $1 } ' | git commit-graph write --stdin-commits To make it so that valid OIDs that refer to non-existent objects are indeed an error after loosening the error handling, perform an extra lookup to make sure that object indeed exists before sending it to the commit-graph internals. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-13 21:59:55 +00:00
# invalid, non-hex OID
echo HEAD | test_expect_code 1 git -C full commit-graph write \
--stdin-commits 2>stderr &&
test_i18ngrep "unexpected non-hex object ID: HEAD" stderr &&
commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in 'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on receiving non-commit OIDs as input to '--stdin-commits'. This behavior can be cumbersome to work around in, say, the case of piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if the caller does not want to cull out non-commits themselves. In this situation, it would be ideal if 'git commit-graph write' wrote the graph containing the inputs that did pertain to commits, and silently ignored the remainder of the input. Some options have been proposed to the effect of '--[no-]check-oids' which would allow callers to have the commit-graph builtin do just that. After some discussion, it is difficult to imagine a caller who wouldn't want to pass '--no-check-oids', suggesting that we should get rid of the behavior of complaining about non-commit inputs altogether. If callers do wish to retain this behavior, they can easily work around this change by doing the following: git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' | awk ' !/commit/ { print "not-a-commit:"$1 } /commit/ { print $1 } ' | git commit-graph write --stdin-commits To make it so that valid OIDs that refer to non-existent objects are indeed an error after loosening the error handling, perform an extra lookup to make sure that object indeed exists before sending it to the commit-graph internals. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-13 21:59:55 +00:00
# non-existent OID
echo $ZERO_OID | test_expect_code 1 git -C full commit-graph write \
--stdin-commits 2>stderr &&
commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in 'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on receiving non-commit OIDs as input to '--stdin-commits'. This behavior can be cumbersome to work around in, say, the case of piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if the caller does not want to cull out non-commits themselves. In this situation, it would be ideal if 'git commit-graph write' wrote the graph containing the inputs that did pertain to commits, and silently ignored the remainder of the input. Some options have been proposed to the effect of '--[no-]check-oids' which would allow callers to have the commit-graph builtin do just that. After some discussion, it is difficult to imagine a caller who wouldn't want to pass '--no-check-oids', suggesting that we should get rid of the behavior of complaining about non-commit inputs altogether. If callers do wish to retain this behavior, they can easily work around this change by doing the following: git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' | awk ' !/commit/ { print "not-a-commit:"$1 } /commit/ { print $1 } ' | git commit-graph write --stdin-commits To make it so that valid OIDs that refer to non-existent objects are indeed an error after loosening the error handling, perform an extra lookup to make sure that object indeed exists before sending it to the commit-graph internals. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-13 21:59:55 +00:00
test_i18ngrep "invalid object" stderr &&
# valid commit and tree OID
git -C full rev-parse HEAD HEAD^{tree} >in &&
git -C full commit-graph write --stdin-commits <in &&
graph_read_expect -C full 3 generation_data
'
test_expect_success 'write graph' '
git -C full commit-graph write &&
test_path_is_file full/$objdir/info/commit-graph &&
graph_read_expect -C full 3 generation_data
'
commit-graph.c: write non-split graphs as read-only In the previous commit, Git learned 'hold_lock_file_for_update_mode' to allow the caller to specify the permission bits (prior to further adjustment by the umask and shared repository permissions) used when acquiring a temporary file. Use this in the commit-graph machinery for writing a non-split graph to acquire an opened temporary file with permissions read-only permissions to match the split behavior. (In the split case, Git uses git_mkstemp_mode' for each of the commit-graph layers with permission bits '0444'). One can notice this discrepancy when moving a non-split graph to be part of a new chain. This causes a commit-graph chain where all layers have read-only permission bits, except for the base layer, which is writable for the current user. Resolve this discrepancy by using the new 'hold_lock_file_for_update_mode' and passing the desired permission bits. Doing so causes some test fallout in t5318 and t6600. In t5318, this occurs in tests that corrupt a commit-graph file by writing into it. For these, 'chmod u+w'-ing the file beforehand resolves the issue. The additional spot in 'corrupt_graph_verify' is necessary because of the extra 'git commit-graph write' beforehand (which *does* rewrite the commit-graph file). In t6600, this is caused by copying a read-only commit-graph file into place and then trying to replace it. For these, make these files writable. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-29 17:36:38 +00:00
test_expect_success POSIXPERM 'write graph has correct permissions' '
test_path_is_file full/$objdir/info/commit-graph &&
commit-graph.c: write non-split graphs as read-only In the previous commit, Git learned 'hold_lock_file_for_update_mode' to allow the caller to specify the permission bits (prior to further adjustment by the umask and shared repository permissions) used when acquiring a temporary file. Use this in the commit-graph machinery for writing a non-split graph to acquire an opened temporary file with permissions read-only permissions to match the split behavior. (In the split case, Git uses git_mkstemp_mode' for each of the commit-graph layers with permission bits '0444'). One can notice this discrepancy when moving a non-split graph to be part of a new chain. This causes a commit-graph chain where all layers have read-only permission bits, except for the base layer, which is writable for the current user. Resolve this discrepancy by using the new 'hold_lock_file_for_update_mode' and passing the desired permission bits. Doing so causes some test fallout in t5318 and t6600. In t5318, this occurs in tests that corrupt a commit-graph file by writing into it. For these, 'chmod u+w'-ing the file beforehand resolves the issue. The additional spot in 'corrupt_graph_verify' is necessary because of the extra 'git commit-graph write' beforehand (which *does* rewrite the commit-graph file). In t6600, this is caused by copying a read-only commit-graph file into place and then trying to replace it. For these, make these files writable. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-29 17:36:38 +00:00
echo "-r--r--r--" >expect &&
test_modebits full/$objdir/info/commit-graph >actual &&
commit-graph.c: write non-split graphs as read-only In the previous commit, Git learned 'hold_lock_file_for_update_mode' to allow the caller to specify the permission bits (prior to further adjustment by the umask and shared repository permissions) used when acquiring a temporary file. Use this in the commit-graph machinery for writing a non-split graph to acquire an opened temporary file with permissions read-only permissions to match the split behavior. (In the split case, Git uses git_mkstemp_mode' for each of the commit-graph layers with permission bits '0444'). One can notice this discrepancy when moving a non-split graph to be part of a new chain. This causes a commit-graph chain where all layers have read-only permission bits, except for the base layer, which is writable for the current user. Resolve this discrepancy by using the new 'hold_lock_file_for_update_mode' and passing the desired permission bits. Doing so causes some test fallout in t5318 and t6600. In t5318, this occurs in tests that corrupt a commit-graph file by writing into it. For these, 'chmod u+w'-ing the file beforehand resolves the issue. The additional spot in 'corrupt_graph_verify' is necessary because of the extra 'git commit-graph write' beforehand (which *does* rewrite the commit-graph file). In t6600, this is caused by copying a read-only commit-graph file into place and then trying to replace it. For these, make these files writable. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-29 17:36:38 +00:00
test_cmp expect actual
'
graph_git_behavior 'graph exists' full commits/3 commits/1
test_expect_success 'Add more commits' '
git -C full reset --hard commits/1 &&
for i in $(test_seq 4 5)
do
test_commit -C full $i &&
git -C full branch commits/$i || return 1
done &&
git -C full reset --hard commits/2 &&
for i in $(test_seq 6 7)
do
test_commit -C full $i &&
git -C full branch commits/$i || return 1
done &&
git -C full reset --hard commits/2 &&
git -C full merge commits/4 &&
git -C full branch merge/1 &&
git -C full reset --hard commits/4 &&
git -C full merge commits/6 &&
git -C full branch merge/2 &&
git -C full reset --hard commits/3 &&
git -C full merge commits/5 commits/7 &&
git -C full branch merge/3 &&
git -C full repack
'
test_expect_success 'commit-graph write progress off for redirected stderr' '
git -C full commit-graph write 2>err &&
test_must_be_empty err
'
test_expect_success 'commit-graph write force progress on for stderr' '
GIT_PROGRESS_DELAY=0 git -C full commit-graph write --progress 2>err &&
test_file_not_empty err
'
test_expect_success 'commit-graph write with the --no-progress option' '
git -C full commit-graph write --no-progress 2>err &&
test_must_be_empty err
'
test_expect_success 'commit-graph write --stdin-commits progress off for redirected stderr' '
git -C full rev-parse commits/5 >in &&
git -C full commit-graph write --stdin-commits <in 2>err &&
test_must_be_empty err
'
test_expect_success 'commit-graph write --stdin-commits force progress on for stderr' '
git -C full rev-parse commits/5 >in &&
GIT_PROGRESS_DELAY=0 git -C full commit-graph write --stdin-commits \
--progress <in 2>err &&
test_i18ngrep "Collecting commits from input" err
'
test_expect_success 'commit-graph write --stdin-commits with the --no-progress option' '
git -C full rev-parse commits/5 >in &&
git -C full commit-graph write --stdin-commits --no-progress <in 2>err &&
test_must_be_empty err
'
test_expect_success 'commit-graph verify progress off for redirected stderr' '
git -C full commit-graph verify 2>err &&
test_must_be_empty err
'
test_expect_success 'commit-graph verify force progress on for stderr' '
GIT_PROGRESS_DELAY=0 git -C full commit-graph verify --progress 2>err &&
test_file_not_empty err
'
test_expect_success 'commit-graph verify with the --no-progress option' '
git -C full commit-graph verify --no-progress 2>err &&
test_must_be_empty err
'
# Current graph structure:
#
# __M3___
# / | \
# 3 M1 5 M2 7
# |/ \|/ \|
# 2 4 6
# |___/____/
# 1
test_expect_success 'write graph with merges' '
git -C full commit-graph write &&
test_path_is_file full/$objdir/info/commit-graph &&
graph_read_expect -C full 10 "generation_data extra_edges"
'
graph_git_behavior 'merge 1 vs 2' full merge/1 merge/2
graph_git_behavior 'merge 1 vs 3' full merge/1 merge/3
graph_git_behavior 'merge 2 vs 3' full merge/2 merge/3
test_expect_success 'Add one more commit' '
test_commit -C full 8 &&
git -C full branch commits/8 &&
ls full/$objdir/pack | grep idx >existing-idx &&
git -C full repack &&
ls full/$objdir/pack| grep idx | grep -v -f existing-idx >new-idx
'
# Current graph structure:
#
# 8
# |
# __M3___
# / | \
# 3 M1 5 M2 7
# |/ \|/ \|
# 2 4 6
# |___/____/
# 1
graph_git_behavior 'mixed mode, commit 8 vs merge 1' full commits/8 merge/1
graph_git_behavior 'mixed mode, commit 8 vs merge 2' full commits/8 merge/2
test_expect_success 'write graph with new commit' '
git -C full commit-graph write &&
test_path_is_file full/$objdir/info/commit-graph &&
graph_read_expect -C full 11 "generation_data extra_edges"
'
graph_git_behavior 'full graph, commit 8 vs merge 1' full commits/8 merge/1
graph_git_behavior 'full graph, commit 8 vs merge 2' full commits/8 merge/2
test_expect_success 'write graph with nothing new' '
git -C full commit-graph write &&
test_path_is_file full/$objdir/info/commit-graph &&
graph_read_expect -C full 11 "generation_data extra_edges"
'
graph_git_behavior 'cleared graph, commit 8 vs merge 1' full commits/8 merge/1
graph_git_behavior 'cleared graph, commit 8 vs merge 2' full commits/8 merge/2
test_expect_success 'build graph from latest pack with closure' '
git -C full commit-graph write --stdin-packs <new-idx &&
test_path_is_file full/$objdir/info/commit-graph &&
graph_read_expect -C full 9 "generation_data extra_edges"
'
graph_git_behavior 'graph from pack, commit 8 vs merge 1' full commits/8 merge/1
graph_git_behavior 'graph from pack, commit 8 vs merge 2' full commits/8 merge/2
test_expect_success 'build graph from commits with closure' '
git -C full tag -a -m "merge" tag/merge merge/2 &&
git -C full rev-parse tag/merge >commits-in &&
git -C full rev-parse merge/1 >>commits-in &&
git -C full commit-graph write --stdin-commits <commits-in &&
test_path_is_file full/$objdir/info/commit-graph &&
graph_read_expect -C full 6 "generation_data"
'
graph_git_behavior 'graph from commits, commit 8 vs merge 1' full commits/8 merge/1
graph_git_behavior 'graph from commits, commit 8 vs merge 2' full commits/8 merge/2
test_expect_success 'build graph from commits with append' '
git -C full rev-parse merge/3 >in &&
git -C full commit-graph write --stdin-commits --append <in &&
test_path_is_file full/$objdir/info/commit-graph &&
graph_read_expect -C full 10 "generation_data extra_edges"
'
graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
test_expect_success 'build graph using --reachable' '
git -C full commit-graph write --reachable &&
test_path_is_file full/$objdir/info/commit-graph &&
graph_read_expect -C full 11 "generation_data extra_edges"
'
graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
test_expect_success 'setup bare repo' '
git clone --bare --no-local full bare
'
graph_git_behavior 'bare repo, commit 8 vs merge 1' bare commits/8 merge/1
graph_git_behavior 'bare repo, commit 8 vs merge 2' bare commits/8 merge/2
test_expect_success 'write graph in bare repo' '
git -C bare commit-graph write &&
test_path_is_file bare/objects/info/commit-graph &&
graph_read_expect -C bare 11 "generation_data extra_edges"
'
graph_git_behavior 'bare repo with graph, commit 8 vs merge 1' bare commits/8 merge/1
graph_git_behavior 'bare repo with graph, commit 8 vs merge 2' bare commits/8 merge/2
test_expect_success 'perform fast-forward merge in full repo' '
git -C full checkout -b merge-5-to-8 commits/5 &&
git -C full merge commits/8 &&
git -C full show-ref -s merge-5-to-8 >output &&
git -C full show-ref -s commits/8 >expect &&
test_cmp expect output
'
test_expect_success 'check that gc computes commit-graph' '
test_commit -C full --no-tag blank &&
git -C full commit-graph write --reachable &&
cp full/$objdir/info/commit-graph commit-graph-before-gc &&
git -C full reset --hard HEAD~1 &&
test_config -C full gc.writeCommitGraph true &&
git -C full gc &&
cp full/$objdir/info/commit-graph commit-graph-after-gc &&
! test_cmp_bin commit-graph-before-gc commit-graph-after-gc &&
git -C full commit-graph write --reachable &&
test_cmp_bin commit-graph-after-gc full/$objdir/info/commit-graph
'
test_expect_success 'replace-objects invalidates commit-graph' '
test_when_finished rm -rf replace &&
git clone full replace &&
(
cd replace &&
git commit-graph write --reachable &&
test_path_is_file .git/objects/info/commit-graph &&
git replace HEAD~1 HEAD~2 &&
graph_git_two_modes "commit-graph verify" &&
git -c core.commitGraph=false log >expect &&
git -c core.commitGraph=true log >actual &&
test_cmp expect actual &&
git commit-graph write --reachable &&
git -c core.commitGraph=false --no-replace-objects log >expect &&
git -c core.commitGraph=true --no-replace-objects log >actual &&
test_cmp expect actual &&
rm -rf .git/objects/info/commit-graph &&
git commit-graph write --reachable &&
test_path_is_file .git/objects/info/commit-graph
)
'
test_expect_success 'commit grafts invalidate commit-graph' '
test_when_finished rm -rf graft &&
git clone --template= full graft &&
(
cd graft &&
git commit-graph write --reachable &&
test_path_is_file .git/objects/info/commit-graph &&
H1=$(git rev-parse --verify HEAD~1) &&
H3=$(git rev-parse --verify HEAD~3) &&
mkdir .git/info &&
echo "$H1 $H3" >.git/info/grafts &&
git -c core.commitGraph=false log >expect &&
git -c core.commitGraph=true log >actual &&
test_cmp expect actual &&
git commit-graph write --reachable &&
git -c core.commitGraph=false --no-replace-objects log >expect &&
git -c core.commitGraph=true --no-replace-objects log >actual &&
test_cmp expect actual &&
rm -rf .git/objects/info/commit-graph &&
git commit-graph write --reachable &&
test_path_is_missing .git/objects/info/commit-graph
)
'
test_expect_success 'replace-objects invalidates commit-graph' '
test_when_finished rm -rf shallow &&
git clone --depth 2 "file://$TRASH_DIRECTORY/full" shallow &&
(
cd shallow &&
git commit-graph write --reachable &&
test_path_is_missing .git/objects/info/commit-graph &&
git fetch origin --unshallow &&
git commit-graph write --reachable &&
test_path_is_file .git/objects/info/commit-graph
)
'
commit-graph: use the "hash version" byte The commit-graph format reserved a byte among the header of the file to store a "hash version". During the SHA-256 work, this was not modified because file formats are not necessarily intended to work across hash versions. If a repository has SHA-256 as its hash algorithm, it automatically up-shifts the lengths of object names in all necessary formats. However, since we have this byte available for adjusting the version, we can make the file formats more obviously incompatible instead of relying on other context from the repository. Update the oid_version() method in commit-graph.c to add a new value, 2, for sha-256. This automatically writes the new value in a SHA-256 repository _and_ verifies the value is correct. This is a breaking change relative to the current 'master' branch since 092b677 (Merge branch 'bc/sha-256-cvs-svn-updates', 2020-08-13) but it is not breaking relative to any released version of Git. The test impact is relatively minor: the output of 'test-tool read-graph' lists the header information, so those instances of '1' need to be replaced with a variable determined by GIT_TEST_DEFAULT_HASH. A more careful test is added that specifically creates a repository of each type then swaps the commit-graph files. The important value here is that the "git log" command succeeds while writing a message to stderr. Helped-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-17 14:04:47 +00:00
test_expect_success 'warn on improper hash version' '
git init --object-format=sha1 sha1 &&
(
cd sha1 &&
test_commit 1 &&
git commit-graph write --reachable &&
mv .git/objects/info/commit-graph ../cg-sha1
) &&
git init --object-format=sha256 sha256 &&
(
cd sha256 &&
test_commit 1 &&
git commit-graph write --reachable &&
mv .git/objects/info/commit-graph ../cg-sha256
) &&
(
cd sha1 &&
mv ../cg-sha256 .git/objects/info/commit-graph &&
git log -1 2>err &&
test_i18ngrep "commit-graph hash version 2 does not match version 1" err
) &&
(
cd sha256 &&
mv ../cg-sha1 .git/objects/info/commit-graph &&
git log -1 2>err &&
test_i18ngrep "commit-graph hash version 1 does not match version 2" err
)
'
2022-03-01 19:48:30 +00:00
test_expect_success TIME_IS_64BIT,TIME_T_IS_64BIT 'lower layers have overflow chunk' '
commit-graph: always parse before commit_graph_data_at() There is a subtle failure happening when computing corrected commit dates with --split enabled. It requires a base layer needing the generation_data_overflow chunk. Then, the next layer on top erroneously thinks it needs an overflow chunk due to a bug leading to recalculating all reachable generation numbers. The output of the failure is BUG: commit-graph.c:1912: expected to write 8 bytes to chunk 47444f56, but wrote 0 instead These "expected" 8 bytes are due to re-computing the corrected commit date for the lower layer but the new layer does not need any overflow. Add a test to t5318-commit-graph.sh that demonstrates this bug. However, it does not trigger consistently with the existing code. The generation number data is stored in a slab and accessed by commit_graph_data_at(). This data is initialized when parsing a commit, but is otherwise used assuming it has been populated. The loop in compute_generation_numbers() did not enforce that all reachable commits were parsed and had correct values. This could lead to some problems when writing a commit-graph with corrected commit dates based on a commit-graph without them. It has been difficult to identify the issue here because it was so hard to reproduce. It relies on this uninitialized data having a non-zero value, but also on specifically in a way that overwrites the existing data. This patch adds the extra parse to ensure the data is filled before we compute the generation number of a commit. This triggers the new test to fail because the generation number overflow count does not match between this computation and the write for that chunk. The actual fix will follow as the next few changes. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-01 17:15:04 +00:00
UNIX_EPOCH_ZERO="@0 +0000" &&
2022-03-01 19:48:30 +00:00
FUTURE_DATE="@4147483646 +0000" &&
rm -f full/.git/objects/info/commit-graph &&
test_commit -C full --date "$FUTURE_DATE" future-1 &&
test_commit -C full --date "$UNIX_EPOCH_ZERO" old-1 &&
git -C full commit-graph write --reachable &&
test_commit -C full --date "$FUTURE_DATE" future-2 &&
test_commit -C full --date "$UNIX_EPOCH_ZERO" old-2 &&
git -C full commit-graph write --reachable --split=no-merge &&
test_commit -C full extra &&
git -C full commit-graph write --reachable --split=no-merge &&
git -C full commit-graph write --reachable &&
graph_read_expect -C full 16 \
"generation_data generation_data_overflow extra_edges" &&
mv full/.git/objects/info/commit-graph commit-graph-upgraded &&
git -C full commit-graph write --reachable &&
graph_read_expect -C full 16 \
"generation_data generation_data_overflow extra_edges" &&
test_cmp full/.git/objects/info/commit-graph commit-graph-upgraded
commit-graph: always parse before commit_graph_data_at() There is a subtle failure happening when computing corrected commit dates with --split enabled. It requires a base layer needing the generation_data_overflow chunk. Then, the next layer on top erroneously thinks it needs an overflow chunk due to a bug leading to recalculating all reachable generation numbers. The output of the failure is BUG: commit-graph.c:1912: expected to write 8 bytes to chunk 47444f56, but wrote 0 instead These "expected" 8 bytes are due to re-computing the corrected commit date for the lower layer but the new layer does not need any overflow. Add a test to t5318-commit-graph.sh that demonstrates this bug. However, it does not trigger consistently with the existing code. The generation number data is stored in a slab and accessed by commit_graph_data_at(). This data is initialized when parsing a commit, but is otherwise used assuming it has been populated. The loop in compute_generation_numbers() did not enforce that all reachable commits were parsed and had correct values. This could lead to some problems when writing a commit-graph with corrected commit dates based on a commit-graph without them. It has been difficult to identify the issue here because it was so hard to reproduce. It relies on this uninitialized data having a non-zero value, but also on specifically in a way that overwrites the existing data. This patch adds the extra parse to ensure the data is filled before we compute the generation number of a commit. This triggers the new test to fail because the generation number overflow count does not match between this computation and the write for that chunk. The actual fix will follow as the next few changes. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-01 17:15:04 +00:00
'
# the verify tests below expect the commit-graph to contain
# exactly the commits reachable from the commits/8 branch.
# If the file changes the set of commits in the list, then the
# offsets into the binary file will result in different edits
# and the tests will likely break.
test_expect_success 'git commit-graph verify' '
git -C full rev-parse commits/8 >in &&
git -C full -c commitGraph.generationVersion=1 commit-graph write \
--stdin-commits <in &&
git -C full commit-graph verify >output &&
graph_read_expect -C full 9 extra_edges 1
'
NUM_COMMITS=9
NUM_OCTOPUS_EDGES=2
HASH_LEN="$(test_oid rawsz)"
GRAPH_BYTE_VERSION=4
GRAPH_BYTE_HASH=5
GRAPH_BYTE_CHUNK_COUNT=6
GRAPH_CHUNK_LOOKUP_OFFSET=8
GRAPH_CHUNK_LOOKUP_WIDTH=12
GRAPH_CHUNK_LOOKUP_ROWS=5
GRAPH_BYTE_OID_FANOUT_ID=$GRAPH_CHUNK_LOOKUP_OFFSET
GRAPH_BYTE_OID_LOOKUP_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
1 * $GRAPH_CHUNK_LOOKUP_WIDTH))
GRAPH_BYTE_COMMIT_DATA_ID=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
2 * $GRAPH_CHUNK_LOOKUP_WIDTH))
GRAPH_FANOUT_OFFSET=$(($GRAPH_CHUNK_LOOKUP_OFFSET + \
$GRAPH_CHUNK_LOOKUP_WIDTH * $GRAPH_CHUNK_LOOKUP_ROWS))
GRAPH_BYTE_FANOUT1=$(($GRAPH_FANOUT_OFFSET + 4 * 4))
GRAPH_BYTE_FANOUT2=$(($GRAPH_FANOUT_OFFSET + 4 * 255))
GRAPH_OID_LOOKUP_OFFSET=$(($GRAPH_FANOUT_OFFSET + 4 * 256))
GRAPH_BYTE_OID_LOOKUP_ORDER=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 8))
GRAPH_BYTE_OID_LOOKUP_MISSING=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * 4 + 10))
GRAPH_COMMIT_DATA_WIDTH=$(($HASH_LEN + 16))
GRAPH_COMMIT_DATA_OFFSET=$(($GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN * $NUM_COMMITS))
GRAPH_BYTE_COMMIT_TREE=$GRAPH_COMMIT_DATA_OFFSET
GRAPH_BYTE_COMMIT_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN))
GRAPH_BYTE_COMMIT_EXTRA_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 4))
GRAPH_BYTE_COMMIT_WRONG_PARENT=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 3))
GRAPH_BYTE_COMMIT_GENERATION=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 11))
GRAPH_BYTE_COMMIT_GENERATION_LAST=$(($GRAPH_BYTE_COMMIT_GENERATION + $(($NUM_COMMITS - 1)) * $GRAPH_COMMIT_DATA_WIDTH))
GRAPH_BYTE_COMMIT_DATE=$(($GRAPH_COMMIT_DATA_OFFSET + $HASH_LEN + 12))
GRAPH_OCTOPUS_DATA_OFFSET=$(($GRAPH_COMMIT_DATA_OFFSET + \
$GRAPH_COMMIT_DATA_WIDTH * $NUM_COMMITS))
GRAPH_BYTE_OCTOPUS=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4))
GRAPH_BYTE_FOOTER=$(($GRAPH_OCTOPUS_DATA_OFFSET + 4 * $NUM_OCTOPUS_EDGES))
corrupt_graph_setup() {
test_when_finished mv commit-graph-backup full/$objdir/info/commit-graph &&
cp full/$objdir/info/commit-graph commit-graph-backup &&
chmod u+w full/$objdir/info/commit-graph
}
corrupt_graph_verify() {
grepstr=$1
test_must_fail git -C full commit-graph verify 2>test_err &&
grep -v "^+" test_err >err &&
commit-graph: fix segfault on e.g. "git status" When core.commitGraph=true is set, various common commands now consult the commit graph. Because the commit-graph code is very trusting of its input data, it's possibly to construct a graph that'll cause an immediate segfault on e.g. "status" (and e.g. "log", "blame", ...). In some other cases where git immediately exits with a cryptic error about the graph being broken. The root cause of this is that while the "commit-graph verify" sub-command exhaustively verifies the graph, other users of the graph simply trust the graph, and will e.g. deference data found at certain offsets as pointers, causing segfaults. This change does the bare minimum to ensure that we don't segfault in the common fill_commit_in_graph() codepath called by e.g. setup_revisions(), to do this instrument the "commit-graph verify" tests to always check if "status" would subsequently segfault. This fixes the following tests which would previously segfault: not ok 50 - detect low chunk count not ok 51 - detect missing OID fanout chunk not ok 52 - detect missing OID lookup chunk not ok 53 - detect missing commit data chunk Those happened because with the commit-graph enabled setup_revisions() would eventually call fill_commit_in_graph(), where e.g. g->chunk_commit_data is used early as an offset (and will be 0x0). With this change we get far enough to detect that the graph is broken, and show an error instead. E.g.: $ git status; echo $? error: commit-graph is missing the Commit Data chunk 1 That also sucks, we should *warn* and not hard-fail "status" just because the commit-graph is corrupt, but fixing is left to a follow-up change. A side-effect of changing the reporting from graph_report() to error() is that we now have an "error: " prefix for these even for "commit-graph verify". Pseudo-diff before/after: $ git commit-graph verify -commit-graph is missing the Commit Data chunk +error: commit-graph is missing the Commit Data chunk Changing that is OK. Various errors it emits now early on are prefixed with "error: ", moving these over and changing the output doesn't break anything. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-25 12:08:29 +00:00
test_i18ngrep "$grepstr" err &&
commit-graph write: don't die if the existing graph is corrupt When the commit-graph is written we end up calling parse_commit(). This will in turn invoke code that'll consult the existing commit-graph about the commit, if the graph is corrupted we die. We thus get into a state where a failing "commit-graph verify" can't be followed-up with a "commit-graph write" if core.commitGraph=true is set, the graph either needs to be manually removed to proceed, or core.commitGraph needs to be set to "false". Change the "commit-graph write" codepath to use a new parse_commit_no_graph() helper instead of parse_commit() to avoid this. The latter will call repo_parse_commit_internal() with use_commit_graph=1 as seen in 177722b344 ("commit: integrate commit graph with commit parsing", 2018-04-10). Not using the old graph at all slows down the writing of the new graph by some small amount, but is a sensible way to prevent an error in the existing commit-graph from spreading. Just fixing the current issue would be likely to result in code that's inadvertently broken in the future. New code might use the commit-graph at a distance. To detect such cases introduce a "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" setting used when we do our corruption tests, and test that a "write/verify" combo works after every one of our current test cases where we now detect commit-graph corruption. Some of the code changes here might be strictly unnecessary, e.g. I was unable to find cases where the parse_commit() called from write_graph_chunk_data() didn't exit early due to "item->object.parsed" being true in repo_parse_commit_internal() (before the use_commit_graph=1 has any effect). But let's also convert those cases for good measure, we do not have exhaustive tests for all possible types of commit-graph corruption. This might need to be re-visited if we learn to write the commit-graph incrementally, but probably not. Hopefully we'll just start by finding out what commits we have in total, then read the old graph(s) to see what they cover, and finally write a new graph file with everything that's missing. In that case the new graph writing code just needs to continue to use e.g. a parse_commit() that doesn't consult the existing commit-graphs. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-25 12:08:33 +00:00
if test "$2" != "no-copy"
then
cp full/$objdir/info/commit-graph commit-graph-pre-write-test
commit-graph write: don't die if the existing graph is corrupt When the commit-graph is written we end up calling parse_commit(). This will in turn invoke code that'll consult the existing commit-graph about the commit, if the graph is corrupted we die. We thus get into a state where a failing "commit-graph verify" can't be followed-up with a "commit-graph write" if core.commitGraph=true is set, the graph either needs to be manually removed to proceed, or core.commitGraph needs to be set to "false". Change the "commit-graph write" codepath to use a new parse_commit_no_graph() helper instead of parse_commit() to avoid this. The latter will call repo_parse_commit_internal() with use_commit_graph=1 as seen in 177722b344 ("commit: integrate commit graph with commit parsing", 2018-04-10). Not using the old graph at all slows down the writing of the new graph by some small amount, but is a sensible way to prevent an error in the existing commit-graph from spreading. Just fixing the current issue would be likely to result in code that's inadvertently broken in the future. New code might use the commit-graph at a distance. To detect such cases introduce a "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" setting used when we do our corruption tests, and test that a "write/verify" combo works after every one of our current test cases where we now detect commit-graph corruption. Some of the code changes here might be strictly unnecessary, e.g. I was unable to find cases where the parse_commit() called from write_graph_chunk_data() didn't exit early due to "item->object.parsed" being true in repo_parse_commit_internal() (before the use_commit_graph=1 has any effect). But let's also convert those cases for good measure, we do not have exhaustive tests for all possible types of commit-graph corruption. This might need to be re-visited if we learn to write the commit-graph incrementally, but probably not. Hopefully we'll just start by finding out what commits we have in total, then read the old graph(s) to see what they cover, and finally write a new graph file with everything that's missing. In that case the new graph writing code just needs to continue to use e.g. a parse_commit() that doesn't consult the existing commit-graphs. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-25 12:08:33 +00:00
fi &&
git -C full status --short &&
GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE=true git -C full commit-graph write &&
chmod u+w full/$objdir/info/commit-graph &&
git -C full commit-graph verify
}
# usage: corrupt_graph_and_verify <position> <data> <string> [<zero_pos>]
# Manipulates the commit-graph file at the position
# by inserting the data, optionally zeroing the file
# starting at <zero_pos>, then runs 'git commit-graph verify'
# and places the output in the file 'err'. Test 'err' for
# the given string.
corrupt_graph_and_verify() {
pos=$1
data="${2:-\0}"
grepstr=$3
corrupt_graph_setup &&
orig_size=$(wc -c <full/$objdir/info/commit-graph) &&
zero_pos=${4:-${orig_size}} &&
printf "$data" | dd of="full/$objdir/info/commit-graph" bs=1 seek="$pos" conv=notrunc &&
dd of="full/$objdir/info/commit-graph" bs=1 seek="$zero_pos" if=/dev/null &&
test-tool genzeros $(($orig_size - $zero_pos)) >>"full/$objdir/info/commit-graph" &&
corrupt_graph_verify "$grepstr"
}
test_expect_success POSIXPERM,SANITY 'detect permission problem' '
corrupt_graph_setup &&
chmod 000 full/$objdir/info/commit-graph &&
commit-graph write: don't die if the existing graph is corrupt When the commit-graph is written we end up calling parse_commit(). This will in turn invoke code that'll consult the existing commit-graph about the commit, if the graph is corrupted we die. We thus get into a state where a failing "commit-graph verify" can't be followed-up with a "commit-graph write" if core.commitGraph=true is set, the graph either needs to be manually removed to proceed, or core.commitGraph needs to be set to "false". Change the "commit-graph write" codepath to use a new parse_commit_no_graph() helper instead of parse_commit() to avoid this. The latter will call repo_parse_commit_internal() with use_commit_graph=1 as seen in 177722b344 ("commit: integrate commit graph with commit parsing", 2018-04-10). Not using the old graph at all slows down the writing of the new graph by some small amount, but is a sensible way to prevent an error in the existing commit-graph from spreading. Just fixing the current issue would be likely to result in code that's inadvertently broken in the future. New code might use the commit-graph at a distance. To detect such cases introduce a "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" setting used when we do our corruption tests, and test that a "write/verify" combo works after every one of our current test cases where we now detect commit-graph corruption. Some of the code changes here might be strictly unnecessary, e.g. I was unable to find cases where the parse_commit() called from write_graph_chunk_data() didn't exit early due to "item->object.parsed" being true in repo_parse_commit_internal() (before the use_commit_graph=1 has any effect). But let's also convert those cases for good measure, we do not have exhaustive tests for all possible types of commit-graph corruption. This might need to be re-visited if we learn to write the commit-graph incrementally, but probably not. Hopefully we'll just start by finding out what commits we have in total, then read the old graph(s) to see what they cover, and finally write a new graph file with everything that's missing. In that case the new graph writing code just needs to continue to use e.g. a parse_commit() that doesn't consult the existing commit-graphs. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-25 12:08:33 +00:00
corrupt_graph_verify "Could not open" "no-copy"
'
test_expect_success 'detect too small' '
corrupt_graph_setup &&
echo "a small graph" >full/$objdir/info/commit-graph &&
corrupt_graph_verify "too small"
'
test_expect_success 'detect bad signature' '
corrupt_graph_and_verify 0 "\0" \
"graph signature"
'
test_expect_success 'detect bad version' '
corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
"graph version"
'
test_expect_success 'detect bad hash version' '
corrupt_graph_and_verify $GRAPH_BYTE_HASH "\03" \
"hash version"
'
test_expect_success 'detect low chunk count' '
commit-graph: fix parsing the Chunk Lookup table The commit-graph file format specifies that the chunks may be in any order. However, if the OID Lookup chunk happens to be the last one in the file, then any command attempting to access the commit-graph data will fail with: fatal: invalid commit position. commit-graph is likely corrupt In this case the error is wrong, the commit-graph file does conform to the specification, but the parsing of the Chunk Lookup table is a bit buggy, and leaves the field holding the number of commits in the commit-graph zero-initialized. The number of commits in the commit-graph is determined while parsing the Chunk Lookup table, by dividing the size of the OID Lookup chunk with the hash size. However, the Chunk Lookup table doesn't actually store the size of the chunks, but it stores their starting offset. Consequently, the size of a chunk can only be calculated by subtracting the starting offsets of that chunk from the offset of the subsequent chunk, or in case of the last chunk from the offset recorded in the terminating label. This is currenly implemented in a bit complicated way: as we iterate over the entries of the Chunk Lookup table, we check the ID of each chunk and store its starting offset, then we check the ID of the last seen chunk and calculate its size using its previously saved offset if necessary (at the moment it's only necessary for the OID Lookup chunk). Alas, while parsing the Chunk Lookup table we only interate through the "real" chunks, but never look at the terminating label, thus don't even check whether it's necessary to calulate the size of the last chunk. Consequently, if the OID Lookup chunk is the last one, then we don't calculate its size and turn don't run the piece of code determining the number of commits in the commit graph, leaving the field holding that number unchanged (i.e. zero-initialized), eventually triggering the sanity check in load_oid_from_graph(). Fix this by iterating through all entries in the Chunk Lookup table, including the terminating label. Note that this is the minimal fix, suitable for the maintenance track. A better fix would be to simplify how the chunk sizes are calculated, but that is a more invasive change, less suitable for 'maint', so that will be done in later patches. This additional flexibility of scanning more chunks breaks a test for "git commit-graph verify" so alter that test to mutate the commit-graph to have an even lower chunk count. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-05 13:00:24 +00:00
corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\01" \
"final chunk has non-zero id"
'
test_expect_success 'detect missing OID fanout chunk' '
corrupt_graph_and_verify $GRAPH_BYTE_OID_FANOUT_ID "\0" \
"commit-graph required OID fanout chunk missing or corrupted"
'
test_expect_success 'detect missing OID lookup chunk' '
corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ID "\0" \
"commit-graph required OID lookup chunk missing or corrupted"
'
test_expect_success 'detect missing commit data chunk' '
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATA_ID "\0" \
"commit-graph required commit data chunk missing or corrupted"
'
test_expect_success 'detect incorrect fanout' '
corrupt_graph_and_verify $GRAPH_BYTE_FANOUT1 "\01" \
"fanout value"
'
test_expect_success 'detect incorrect fanout final value' '
corrupt_graph_and_verify $GRAPH_BYTE_FANOUT2 "\01" \
commit-graph: use fanout value for graph size Commit-graph, midx, and pack idx files all have both a lookup table of oids and an oid fanout table. In midx and pack idx files, we take the final entry of the fanout table as the source of truth for the number of entries, and then verify that the size of the lookup table matches that. But for commit-graph files, we do the opposite: we use the size of the lookup table as the source of truth, and then check the final fanout entry against it. As noted in 4169d89645 (commit-graph: check consistency of fanout table, 2023-10-09), either is correct. But there are a few reasons to prefer the fanout table as the source of truth: 1. The fanout entries are 32-bits on disk, and that defines the maximum number of entries we can store. But since the size of the lookup table is only bounded by the filesystem, it can be much larger. And hence computing it as the commit-graph does means that we may truncate the result when storing it in a uint32_t. 2. We read the fanout first, then the lookup table. If we're verifying the chunks as we read them, then we'd want to take the fanout as truth (we have nothing yet to check it against) and then we can check that the lookup table matches what we already know. 3. It is pointlessly inconsistent with the midx and pack idx code. Since the three have to do similar size and bounds checks, it is easier to reason about all three if they use the same approach. So this patch moves the assignment of g->num_commits to the fanout parser, and then we can check the size of the lookup chunk as soon as we try to load it. There's already a test covering this situation, which munges the final fanout entry to 2^32-1. In the current code we complain that it does not agree with the table size. But now that we treat the munged value as the source of truth, we'll complain that the lookup table is the wrong size (again, either is correct). So we'll have to update the message we expect (and likewise for an earlier test which does similar munging). There's a similar test for this situation on the midx side, but rather than making a very-large fanout value, it just truncates the lookup table. We could do that here, too, but the very-large fanout value actually shows an interesting corner case. On a 32-bit system, multiplying to find the expected table size would cause an integer overflow. Using st_mult() would detect that, but cause us to die() rather than falling back to the non-graph code path. Checking the size using division (as we do with existing chunk-size checks) avoids the overflow entirely, and the test demonstrates this when run on a 32-bit system. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-09 07:24:35 +00:00
"OID lookup chunk is the wrong size"
'
test_expect_success 'detect incorrect OID order' '
corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ORDER "\01" \
"incorrect OID order"
'
test_expect_success 'detect OID not in object database' '
corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_MISSING "\01" \
"from object database"
'
test_expect_success 'detect incorrect tree OID' '
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_TREE "\01" \
"root tree OID for commit"
'
test_expect_success 'detect incorrect parent int-id' '
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_PARENT "\01" \
"invalid parent"
'
test_expect_success 'detect extra parent int-id' '
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_EXTRA_PARENT "\00" \
"is too long"
'
test_expect_success 'detect wrong parent' '
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_WRONG_PARENT "\01" \
"commit-graph parent for"
'
test_expect_success 'detect incorrect generation number' '
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\070" \
"generation for commit"
'
test_expect_success 'detect incorrect commit date' '
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_DATE "\01" \
"commit date"
'
test_expect_success 'detect incorrect parent for octopus merge' '
corrupt_graph_and_verify $GRAPH_BYTE_OCTOPUS "\01" \
"invalid parent"
'
test_expect_success 'detect invalid checksum hash' '
corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
"incorrect checksum"
'
test_expect_success 'detect incorrect chunk count' '
corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\377" \
"commit-graph file is too small to hold [0-9]* chunks" \
$GRAPH_CHUNK_LOOKUP_OFFSET
'
test_expect_success 'detect mixed generation numbers (non-zero to zero)' '
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION_LAST "\0\0\0\0" \
"both zero and non-zero generations"
'
test_expect_success 'detect mixed generation numbers (zero to non-zero)' '
corrupt_graph_and_verify $GRAPH_BYTE_COMMIT_GENERATION "\0\0\0\0" \
"both zero and non-zero generations"
'
test_expect_success 'git fsck (checks commit-graph when config set to true)' '
git -C full fsck &&
corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
"incorrect checksum" &&
cp commit-graph-pre-write-test full/$objdir/info/commit-graph &&
test_must_fail git -C full -c core.commitGraph=true fsck
'
test_expect_success 'git fsck (ignores commit-graph when config set to false)' '
git -C full fsck &&
corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
"incorrect checksum" &&
cp commit-graph-pre-write-test full/$objdir/info/commit-graph &&
git -C full -c core.commitGraph=false fsck
'
test_expect_success 'git fsck (checks commit-graph when config unset)' '
test_when_finished "git -C full config core.commitGraph true" &&
git -C full fsck &&
corrupt_graph_and_verify $GRAPH_BYTE_FOOTER "\00" \
"incorrect checksum" &&
test_unconfig -C full core.commitGraph &&
cp commit-graph-pre-write-test full/$objdir/info/commit-graph &&
test_must_fail git -C full fsck
'
test_expect_success 'git fsck shows commit-graph output with --progress' '
git -C "$TRASH_DIRECTORY/full" fsck --progress 2>err &&
grep "Verifying commits in commit graph" err
'
test_expect_success 'git fsck suppresses commit-graph output with --no-progress' '
git -C "$TRASH_DIRECTORY/full" fsck --no-progress 2>err &&
! grep "Verifying commits in commit graph" err
'
test_expect_success 'setup non-the_repository tests' '
rm -rf repo &&
git init repo &&
test_commit -C repo one &&
test_commit -C repo two &&
git -C repo config core.commitGraph true &&
git -C repo rev-parse two | \
git -C repo commit-graph write --stdin-commits
'
test_expect_success 'parse_commit_in_graph works for non-the_repository' '
test-tool repository parse_commit_in_graph \
repo/.git repo "$(git -C repo rev-parse two)" >actual &&
{
git -C repo log --pretty=format:"%ct " -1 &&
git -C repo rev-parse one
} >expect &&
test_cmp expect actual &&
test-tool repository parse_commit_in_graph \
repo/.git repo "$(git -C repo rev-parse one)" >actual &&
git -C repo log --pretty="%ct" -1 one >expect &&
test_cmp expect actual
'
test_expect_success 'get_commit_tree_in_graph works for non-the_repository' '
test-tool repository get_commit_tree_in_graph \
repo/.git repo "$(git -C repo rev-parse two)" >actual &&
git -C repo rev-parse two^{tree} >expect &&
test_cmp expect actual &&
test-tool repository get_commit_tree_in_graph \
repo/.git repo "$(git -C repo rev-parse one)" >actual &&
git -C repo rev-parse one^{tree} >expect &&
test_cmp expect actual
'
2019-09-05 22:04:55 +00:00
test_expect_success 'corrupt commit-graph write (broken parent)' '
rm -rf repo &&
git init repo &&
(
cd repo &&
empty="$(git mktree </dev/null)" &&
cat >broken <<-EOF &&
tree $empty
parent $ZERO_OID
author whatever <whatever@example.com> 1234 -0000
committer whatever <whatever@example.com> 1234 -0000
broken commit
EOF
broken="$(git hash-object -w -t commit --literally broken)" &&
git commit-tree -p "$broken" -m "good commit" "$empty" >good &&
test_must_fail git commit-graph write --stdin-commits \
<good 2>test_err &&
test_i18ngrep "unable to parse commit" test_err
)
'
test_expect_success 'corrupt commit-graph write (missing tree)' '
rm -rf repo &&
git init repo &&
(
cd repo &&
tree="$(git mktree </dev/null)" &&
cat >broken <<-EOF &&
parent $ZERO_OID
author whatever <whatever@example.com> 1234 -0000
committer whatever <whatever@example.com> 1234 -0000
broken commit
EOF
broken="$(git hash-object -w -t commit --literally broken)" &&
git commit-tree -p "$broken" -m "good" "$tree" >good &&
test_must_fail git commit-graph write --stdin-commits \
<good 2>test_err &&
commit, tag: don't set parsed bit for parse failures If we can't parse a commit, then parse_commit() will return an error code. But it _also_ sets the "parsed" flag, which tells us not to bother trying to re-parse the object. That means that subsequent parses have no idea that the information in the struct may be bogus. I.e., doing this: parse_commit(commit); ... if (parse_commit(commit) < 0) die("commit is broken"); will never trigger the die(). The second parse_commit() will see the "parsed" flag and quietly return success. There are two obvious ways to fix this: 1. Stop setting "parsed" until we've successfully parsed. 2. Keep a second "corrupt" flag to indicate that we saw an error (and when the parsed flag is set, return 0/-1 depending on the corrupt flag). This patch does option 1. The obvious downside versus option 2 is that we might continually re-parse a broken object. But in practice, corruption like this is rare, and we typically die() or return an error in the caller. So it's OK not to worry about optimizing for corruption. And it's much simpler: we don't need to use an extra bit in the object struct, and callers which check the "parsed" flag don't need to learn about the corrupt bit, too. There's no new test here, because this case is already covered in t5318. Note that we do need to update the expected message there, because we now detect the problem in the return from "parse_commit()", and not with a separate check for a NULL tree. In fact, we can now ditch that explicit tree check entirely, as we're covered robustly by this change (and the previous recent change to treat a NULL tree as a parse error). We'll also give tags the same treatment. I don't know offhand of any cases where the problem can be triggered (it implies somebody ignoring a parse error earlier in the process), but consistently returning an error should cause the least surprise. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-25 21:20:20 +00:00
test_i18ngrep "unable to parse commit" test_err
)
'
commit-graph: implement generation data chunk As discovered by Ævar, we cannot increment graph version to distinguish between generation numbers v1 and v2 [1]. Thus, one of pre-requistes before implementing generation number v2 was to distinguish between graph versions in a backwards compatible manner. We are going to introduce a new chunk called Generation DATa chunk (or GDAT). GDAT will store corrected committer date offsets whereas CDAT will still store topological level. Old Git does not understand GDAT chunk and would ignore it, reading topological levels from CDAT. New Git can parse GDAT and take advantage of newer generation numbers, falling back to topological levels when GDAT chunk is missing (as it would happen with a commit-graph written by old Git). We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT' which forces commit-graph file to be written without generation data chunk to emulate a commit-graph file written by old Git. To minimize the space required to store corrrected commit date, Git stores corrected commit date offsets into the commit-graph file, instea of corrected commit dates. This saves us 4 bytes per commit, decreasing the GDAT chunk size by half, but it's possible for the offset to overflow the 4-bytes allocated for storage. As such overflows are and should be exceedingly rare, we use the following overflow management scheme: We introduce a new commit-graph chunk, Generation Data OVerflow ('GDOV') to store corrected commit dates for commits with offsets greater than GENERATION_NUMBER_V2_OFFSET_MAX. If the offset is greater than GENERATION_NUMBER_V2_OFFSET_MAX, we set the MSB of the offset and the other bits store the position of corrected commit date in GDOV chunk, similar to how Extra Edge List is maintained. We test the overflow-related code with the following repo history: F - N - U / \ U - N - U N \ / N - F - N Where the commits denoted by U have committer date of zero seconds since Unix epoch, the commits denoted by N have committer date of 1112354055 (default committer date for the test suite) seconds since Unix epoch and the commits denoted by F have committer date of (2 ^ 31 - 2) seconds since Unix epoch. The largest offset observed is 2 ^ 31, just large enough to overflow. [1]: https://lore.kernel.org/git/87a7gdspo4.fsf@evledraar.gmail.com/ Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-16 18:11:15 +00:00
# We test the overflow-related code with the following repo history:
#
# 4:F - 5:N - 6:U
# / \
# 1:U - 2:N - 3:U M:N
# \ /
# 7:N - 8:F - 9:N
#
# Here the commits denoted by U have committer date of zero seconds
# since Unix epoch, the commits denoted by N have committer date
# starting from 1112354055 seconds since Unix epoch (default committer
# date for the test suite), and the commits denoted by F have committer
# date of (2 ^ 31 - 2) seconds since Unix epoch.
#
# The largest offset observed is 2 ^ 31, just large enough to overflow.
#
test_expect_success 'set up and verify repo with generation data overflow chunk' '
UNIX_EPOCH_ZERO="@0 +0000" &&
FUTURE_DATE="@2147483646 +0000" &&
git init repo &&
(
cd repo &&
test_commit --date "$UNIX_EPOCH_ZERO" 1 &&
test_commit 2 &&
test_commit --date "$UNIX_EPOCH_ZERO" 3 &&
git commit-graph write --reachable &&
graph_read_expect 3 generation_data &&
test_commit --date "$FUTURE_DATE" 4 &&
test_commit 5 &&
test_commit --date "$UNIX_EPOCH_ZERO" 6 &&
git branch left &&
git reset --hard 3 &&
test_commit 7 &&
test_commit --date "$FUTURE_DATE" 8 &&
test_commit 9 &&
git branch right &&
git reset --hard 3 &&
test_merge M left right &&
git commit-graph write --reachable &&
graph_read_expect 10 "generation_data generation_data_overflow" &&
git commit-graph verify
)
commit-graph: implement generation data chunk As discovered by Ævar, we cannot increment graph version to distinguish between generation numbers v1 and v2 [1]. Thus, one of pre-requistes before implementing generation number v2 was to distinguish between graph versions in a backwards compatible manner. We are going to introduce a new chunk called Generation DATa chunk (or GDAT). GDAT will store corrected committer date offsets whereas CDAT will still store topological level. Old Git does not understand GDAT chunk and would ignore it, reading topological levels from CDAT. New Git can parse GDAT and take advantage of newer generation numbers, falling back to topological levels when GDAT chunk is missing (as it would happen with a commit-graph written by old Git). We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT' which forces commit-graph file to be written without generation data chunk to emulate a commit-graph file written by old Git. To minimize the space required to store corrrected commit date, Git stores corrected commit date offsets into the commit-graph file, instea of corrected commit dates. This saves us 4 bytes per commit, decreasing the GDAT chunk size by half, but it's possible for the offset to overflow the 4-bytes allocated for storage. As such overflows are and should be exceedingly rare, we use the following overflow management scheme: We introduce a new commit-graph chunk, Generation Data OVerflow ('GDOV') to store corrected commit dates for commits with offsets greater than GENERATION_NUMBER_V2_OFFSET_MAX. If the offset is greater than GENERATION_NUMBER_V2_OFFSET_MAX, we set the MSB of the offset and the other bits store the position of corrected commit date in GDOV chunk, similar to how Extra Edge List is maintained. We test the overflow-related code with the following repo history: F - N - U / \ U - N - U N \ / N - F - N Where the commits denoted by U have committer date of zero seconds since Unix epoch, the commits denoted by N have committer date of 1112354055 (default committer date for the test suite) seconds since Unix epoch and the commits denoted by F have committer date of (2 ^ 31 - 2) seconds since Unix epoch. The largest offset observed is 2 ^ 31, just large enough to overflow. [1]: https://lore.kernel.org/git/87a7gdspo4.fsf@evledraar.gmail.com/ Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-16 18:11:15 +00:00
'
graph_git_behavior 'generation data overflow chunk repo' repo left right
commit-graph: fix corrupt upgrade from generation v1 to v2 The previous commit demonstrates a bug where a commit-graph using generation v2 could enter a state where one of the GDA2 values has its most-significant bit set (indicating that its value should be read from the extended offset table in the GDO2 chunk) without having a GDO2 chunk to read from. This results in the following error message being displayed to the caller: fatal: commit-graph requires overflow generation data but has none This bug arises in the following scenario: - We decide to write a commit-graph using generation number v2, and decide (correctly) that no GDO2 chunk is necessary (e.g., because all of the commiter date offsets are no larger than 2^31-1). - The v2 generation numbers are stored in the `->generation` member of the commit slab holding `struct commit_graph_data`'s. - Later on, `load_commit_graph_info()` is called, overwriting the v2 generation data in the aforementioned slab with any existing v1 generation data. Then, when the commit-graph code goes to write the GDA2 chunk via `write_graph_chunk_generation_data()`, we use the overwritten generation v1 data in a place where we expect to use a v2 generation number: offset = commit_graph_data_at(c)->generation - c->date; ...because `commit_graph_data_at(c)->generation` used to hold the v2 generation data, but it was overwritten to contain the v1 generation number via `load_commit_graph_info()`. If the `offset` computation above overflows the v2 generation number max, then `write_graph_chunk_generation_data()` will update its count of large offsets and write the marker accordingly: if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) { offset = CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW | num_generation_data_overflows; num_generation_data_overflows++; } and reads will look for the GDO2 chunk containing the overflowing v2 generation number, *after* the commit-graph code decided that no such chunk was necessary. The main problem is that the slab containing `struct commit_graph_data` has a dual purpose. It is used to hold data that we are about to write to disk while generating a commit-graph, as well as hold data that was read from an existing commit-graph. When the two mix, namely when the result of reading the commit-graph has a side-effect that mixes poorly with an in-progress commit-graph write, we end up with corrupt data. A complete fix might be to introduce a new slab that is used exclusively for writing, and gate access between the two slabs based on context provided by the caller (e.g., whether this computation is part of a "read" or "write" operation). But a more minimal fix addresses the only known path which overwrites the slab data, which is `compute_bloom_filters()` -> `get_or_compute_bloom_filter()` -> `load_commit_graph_info()` -> `fill_commit_graph_info()` by avoiding the last call which clobbers the data altogether. This path only needs to learn the graph position of a given commit so that it can be used in `load_bloom_filter_from_graph()`. By replacing the last steps of the above with one that records the graph position into a temporary variable which is then used to load the existing Bloom data, we eliminate the clobbering, removing the corruption. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-12 23:10:33 +00:00
test_expect_success 'overflow during generation version upgrade' '
git init overflow-v2-upgrade &&
(
cd overflow-v2-upgrade &&
# This commit will have a date at two seconds past the Epoch,
# and a (v1) generation number of 1, since it is a root commit.
#
# The offset will then be computed as 1-2, which will underflow
# to 2^31, which is greater than the v2 offset small limit of
# 2^31-1.
#
# This is sufficient to need a large offset table for the v2
# generation numbers.
test_commit --date "@2 +0000" base &&
git repack -d &&
# Test that upgrading from generation v1 to v2 correctly
# produces the overflow table.
git -c commitGraph.generationVersion=1 commit-graph write &&
git -c commitGraph.generationVersion=2 commit-graph write \
--changed-paths &&
git rev-list --all
)
'
corrupt_chunk () {
graph=full/.git/objects/info/commit-graph &&
test_when_finished "rm -rf $graph" &&
git -C full commit-graph write --reachable &&
corrupt_chunk_file $graph "$@"
}
check_corrupt_chunk () {
corrupt_chunk "$@" &&
git -C full -c core.commitGraph=false log >expect.out &&
git -C full -c core.commitGraph=true log >out 2>err &&
test_cmp expect.out out
}
test_expect_success 'reader notices too-small oid fanout chunk' '
# make it big enough that the graph file is plausible,
# otherwise we hit an earlier check
check_corrupt_chunk OIDF clear $(printf "000000%02x" $(test_seq 250)) &&
cat >expect.err <<-\EOF &&
error: commit-graph oid fanout chunk is wrong size
error: commit-graph required OID fanout chunk missing or corrupted
EOF
test_cmp expect.err err
'
commit-graph: check consistency of fanout table We use bsearch_hash() to look up items in the oid index of a commit-graph. It also has a fanout table to reduce the initial range in which we'll search. But since the fanout comes from the on-disk file, a corrupted or malicious file can cause us to look outside of the allocated index memory. One solution here would be to pass the total table size to bsearch_hash(), which could then bounds check the values it reads from the fanout. But there's an inexpensive up-front check we can do, and it's the same one used by the midx and pack idx code (both of which likewise have fanout tables and use bsearch_hash(), but are not affected by this bug): 1. We can check the value of the final fanout entry against the size of the table we got from the index chunk. These must always match, since the fanout is just slicing up the index. As a side note, the midx and pack idx code compute it the other way around: they use the final fanout value as the object count, and check the index size against it. Either is valid; if they disagree we cannot know which is wrong (a corrupted fanout value, or a too-small table of oids). 2. We can quickly scan the fanout table to make sure it is monotonically increasing. If it is, then we know that every value is less than or equal to the final value, and therefore less than or equal to the table size. It would also be sufficient to just check that each fanout value is smaller than the final one, but the midx and pack idx code both do a full monotonicity check. It's the same cost, and it catches some other corruptions (though not all; the checks done by "commit-graph verify" are more complete but more expensive, and our goal here is to be fast and memory-safe). There are two new tests. One just checks the final fanout value (this is the mirror image of the "too small oid lookup" case added for the midx in the previous commit; it's flipped here because commit-graph considers the oid lookup chunk to be the source of truth). The other actually creates a fanout with many out-of-bounds entries, and prior to this patch, it does cause the segfault you'd expect. But note that the error is not "your fanout entry is out-of-bounds", but rather "fanout value out of order". That's because we leave the final fanout value in place (to get past the table size check), making the index non-monotonic (the second-to-last entry is big, but the last one must remain small to match the actual table). We need adjustments to a few existing tests, as well: - an earlier test in t5318 corrupts the fanout and runs "commit-graph verify". Its message is now changed, since we catch the problem earlier (during the load step, rather than the careful validation step). - in t5324, we test that "commit-graph verify --shallow" does not do expensive verification on the base file of the chain. But the corruption it uses (munging a byte at offset 1000) happens to be in the middle of the fanout table. And now we detect that problem in the cheaper checks that are performed for every part of the graph. We'll push this back to offset 1500, which is only caught by the more expensive checksum validation. Likewise, there's a later test in t5324 which munges an offset 100 bytes into a file (also in the fanout table) that is referenced by an alternates file. So we now find that corruption during the load step, rather than the verification step. At the very least we need to change the error message (like the case above in t5318). But it is probably good to make sure we handle all parts of the verification even for alternate graph files. So let's likewise corrupt byte 1500 and make sure we found the invalid checksum. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09 21:04:58 +00:00
test_expect_success 'reader notices fanout/lookup table mismatch' '
check_corrupt_chunk OIDF 1020 "FFFFFFFF" &&
cat >expect.err <<-\EOF &&
commit-graph: use fanout value for graph size Commit-graph, midx, and pack idx files all have both a lookup table of oids and an oid fanout table. In midx and pack idx files, we take the final entry of the fanout table as the source of truth for the number of entries, and then verify that the size of the lookup table matches that. But for commit-graph files, we do the opposite: we use the size of the lookup table as the source of truth, and then check the final fanout entry against it. As noted in 4169d89645 (commit-graph: check consistency of fanout table, 2023-10-09), either is correct. But there are a few reasons to prefer the fanout table as the source of truth: 1. The fanout entries are 32-bits on disk, and that defines the maximum number of entries we can store. But since the size of the lookup table is only bounded by the filesystem, it can be much larger. And hence computing it as the commit-graph does means that we may truncate the result when storing it in a uint32_t. 2. We read the fanout first, then the lookup table. If we're verifying the chunks as we read them, then we'd want to take the fanout as truth (we have nothing yet to check it against) and then we can check that the lookup table matches what we already know. 3. It is pointlessly inconsistent with the midx and pack idx code. Since the three have to do similar size and bounds checks, it is easier to reason about all three if they use the same approach. So this patch moves the assignment of g->num_commits to the fanout parser, and then we can check the size of the lookup chunk as soon as we try to load it. There's already a test covering this situation, which munges the final fanout entry to 2^32-1. In the current code we complain that it does not agree with the table size. But now that we treat the munged value as the source of truth, we'll complain that the lookup table is the wrong size (again, either is correct). So we'll have to update the message we expect (and likewise for an earlier test which does similar munging). There's a similar test for this situation on the midx side, but rather than making a very-large fanout value, it just truncates the lookup table. We could do that here, too, but the very-large fanout value actually shows an interesting corner case. On a 32-bit system, multiplying to find the expected table size would cause an integer overflow. Using st_mult() would detect that, but cause us to die() rather than falling back to the non-graph code path. Checking the size using division (as we do with existing chunk-size checks) avoids the overflow entirely, and the test demonstrates this when run on a 32-bit system. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-09 07:24:35 +00:00
error: commit-graph OID lookup chunk is the wrong size
error: commit-graph required OID lookup chunk missing or corrupted
commit-graph: check consistency of fanout table We use bsearch_hash() to look up items in the oid index of a commit-graph. It also has a fanout table to reduce the initial range in which we'll search. But since the fanout comes from the on-disk file, a corrupted or malicious file can cause us to look outside of the allocated index memory. One solution here would be to pass the total table size to bsearch_hash(), which could then bounds check the values it reads from the fanout. But there's an inexpensive up-front check we can do, and it's the same one used by the midx and pack idx code (both of which likewise have fanout tables and use bsearch_hash(), but are not affected by this bug): 1. We can check the value of the final fanout entry against the size of the table we got from the index chunk. These must always match, since the fanout is just slicing up the index. As a side note, the midx and pack idx code compute it the other way around: they use the final fanout value as the object count, and check the index size against it. Either is valid; if they disagree we cannot know which is wrong (a corrupted fanout value, or a too-small table of oids). 2. We can quickly scan the fanout table to make sure it is monotonically increasing. If it is, then we know that every value is less than or equal to the final value, and therefore less than or equal to the table size. It would also be sufficient to just check that each fanout value is smaller than the final one, but the midx and pack idx code both do a full monotonicity check. It's the same cost, and it catches some other corruptions (though not all; the checks done by "commit-graph verify" are more complete but more expensive, and our goal here is to be fast and memory-safe). There are two new tests. One just checks the final fanout value (this is the mirror image of the "too small oid lookup" case added for the midx in the previous commit; it's flipped here because commit-graph considers the oid lookup chunk to be the source of truth). The other actually creates a fanout with many out-of-bounds entries, and prior to this patch, it does cause the segfault you'd expect. But note that the error is not "your fanout entry is out-of-bounds", but rather "fanout value out of order". That's because we leave the final fanout value in place (to get past the table size check), making the index non-monotonic (the second-to-last entry is big, but the last one must remain small to match the actual table). We need adjustments to a few existing tests, as well: - an earlier test in t5318 corrupts the fanout and runs "commit-graph verify". Its message is now changed, since we catch the problem earlier (during the load step, rather than the careful validation step). - in t5324, we test that "commit-graph verify --shallow" does not do expensive verification on the base file of the chain. But the corruption it uses (munging a byte at offset 1000) happens to be in the middle of the fanout table. And now we detect that problem in the cheaper checks that are performed for every part of the graph. We'll push this back to offset 1500, which is only caught by the more expensive checksum validation. Likewise, there's a later test in t5324 which munges an offset 100 bytes into a file (also in the fanout table) that is referenced by an alternates file. So we now find that corruption during the load step, rather than the verification step. At the very least we need to change the error message (like the case above in t5318). But it is probably good to make sure we handle all parts of the verification even for alternate graph files. So let's likewise corrupt byte 1500 and make sure we found the invalid checksum. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09 21:04:58 +00:00
EOF
test_cmp expect.err err
'
test_expect_success 'reader notices out-of-bounds fanout' '
# Rather than try to corrupt a specific hash, we will just
# wreck them all. But we cannot just set them all to 0xFFFFFFFF or
# similar, as they are used for hi/lo starts in a binary search (so if
# they are identical, that indicates that the search should abort
# immediately). Instead, we will give them high values that differ by
# 2^24, ensuring that any that are used would cause an out-of-bounds
# read.
check_corrupt_chunk OIDF 0 $(printf "%02x000000" $(test_seq 0 254)) &&
cat >expect.err <<-\EOF &&
error: commit-graph fanout values out of order
EOF
test_cmp expect.err err
'
test_expect_success 'reader notices too-small commit data chunk' '
check_corrupt_chunk CDAT clear 00000000 &&
cat >expect.err <<-\EOF &&
error: commit-graph commit data chunk is wrong size
error: commit-graph required commit data chunk missing or corrupted
EOF
test_cmp expect.err err
'
commit-graph: detect out-of-bounds extra-edges pointers If an entry in a commit-graph file has more than 2 parents, the fixed-size parent fields instead point to an offset within an "extra edges" chunk. We blindly follow these, assuming that the chunk is present and sufficiently large; this can lead to an out-of-bounds read for a corrupt or malicious file. We can fix this by recording the size of the chunk and adding a bounds-check in fill_commit_in_graph(). There are a few tricky bits: 1. We'll switch from working with a pointer to an offset. This makes some corner cases just fall out naturally: a. If we did not find an EDGE chunk at all, our size will correctly be zero (so everything is "out of bounds"). b. Comparing "size / 4" lets us make sure we have at least 4 bytes to read, and we never compute a pointer more than one element past the end of the array (computing a larger pointer is probably OK in practice, but is technically undefined behavior). c. The current code casts to "uint32_t *". Replacing it with an offset avoids any comparison between different types of pointer (since the chunk is stored as "unsigned char *"). 2. This is the first case in which fill_commit_in_graph() may return anything but success. We need to make sure to roll back the "parsed" flag (and any parents we might have added before running out of buffer) so that the caller can cleanly fall back to loading the commit object itself. It's a little non-trivial to do this, and we might benefit from factoring it out. But we can wait on that until we actually see a second case where we return an error. As a bonus, this lets us drop the st_mult() call. Since we've already done a bounds check, we know there won't be any integer overflow (it would imply our buffer is larger than a size_t can hold). The included test does not actually segfault before this patch (though you could construct a case where it does). Instead, it reads garbage from the next chunk which results in it complaining about a bogus parent id. This is sufficient for our needs, though (we care that the fallback succeeds, and that stderr mentions the out-of-bounds read). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-09 21:05:38 +00:00
test_expect_success 'reader notices out-of-bounds extra edge' '
check_corrupt_chunk EDGE clear &&
cat >expect.err <<-\EOF &&
error: commit-graph extra-edges pointer out of bounds
EOF
test_cmp expect.err err
'
test_expect_success 'reader notices too-small generations chunk' '
check_corrupt_chunk GDA2 clear 00000000 &&
cat >expect.err <<-\EOF &&
error: commit-graph generations chunk is wrong size
EOF
test_cmp expect.err err
'
test_done