git/t/lib-commit-graph.sh

#!/bin/sh

# Helper functions for testing commit-graphs.

# Initialize OID cache with oid_version
test_oid_cache <<-EOF
oid_version sha1:1
oid_version sha256:2
EOF

graph_git_two_modes() {
	git -c core.commitGraph=true $1 >output &&
	git -c core.commitGraph=false $1 >expect &&
	test_cmp expect output
}

# graph_git_behavior <name> <directory> <branch> <compare>
#
# Ensures that a handful of traversal operations produce the same
# results with and without the commit-graph in use.
#
# NOTE: it is a bug to call this function with <directory> containing
# any characters in $IFS.
graph_git_behavior() {
	MSG=$1
	DIR=$2
	BRANCH=$3
	COMPARE=$4
	test_expect_success "check normal git operations: $MSG" '
		graph_git_two_modes "${DIR:+-C $DIR} log --oneline $BRANCH" &&
		graph_git_two_modes "${DIR:+-C $DIR} log --topo-order $BRANCH" &&
		graph_git_two_modes "${DIR:+-C $DIR} log --graph $COMPARE..$BRANCH" &&
		graph_git_two_modes "${DIR:+-C $DIR} branch -vv" &&
		graph_git_two_modes "${DIR:+-C $DIR} merge-base -a $BRANCH $COMPARE"
	'
}

graph_read_expect() {
	OPTIONAL=""
	NUM_CHUNKS=3
	DIR="."
	if test "$1" = -C
	then
		shift
		DIR="$1"
		shift
	fi
	if test -n "$2"
	then
		OPTIONAL=" $2"
		NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
	fi
	GENERATION_VERSION=2
	if test -n "$3"
	then
		GENERATION_VERSION=$3
	fi
	OPTIONS=
	if test $GENERATION_VERSION -gt 1
	then
		OPTIONS=" read_generation_data"
	fi
	cat >"$DIR/expect" <<-EOF
	header: 43475048 1 $(test_oid oid_version) $NUM_CHUNKS 0
	num_commits: $1
	chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL
	options:$OPTIONS
	EOF
	(
		cd "$DIR" &&
		test-tool read-graph >output &&
		test_cmp expect output
	)
}
t5318: extract helpers to lib-commit-graph.sh The graph_git_behavior helper is useful for testing that certain Git commands behave the same when using the commit-graph and when not using the commit-graph. Extract it to a new lib-commit-graph.sh file for use in new test scripts that will split out from t5318. While doing this extraction, also extract graph_read_expect and the logic for priming the test_oid_cache. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2022-03-01 19:48:29 +00:00			`#!/bin/sh`

			`# Helper functions for testing commit-graphs.`

			`# Initialize OID cache with oid_version`
			`test_oid_cache <<-EOF`
			`oid_version sha1:1`
			`oid_version sha256:2`
			`EOF`

			`graph_git_two_modes() {`
			`git -c core.commitGraph=true $1 >output &&`
			`git -c core.commitGraph=false $1 >expect &&`
			`test_cmp expect output`
			`}`

t/lib-commit-graph.sh: avoid directory change in `graph_git_behavior()` The `graph_git_behavior()` helper asserts that a number of common Git operations (such as `git log --oneline`, `git log --topo-order`, etc.) produce identical output regardless of whether or not a commit-graph is in use. This helper takes as its second argument the location (relative to the `$TRASH_DIRECTORY`) of the Git repostiory under test. In order to run each of its commands within that repository, it first changes into that directory, without the use of a sub-shell. This pollutes future tests which expect to be run in the top-level `$TRASH_DIRECTORY` as usual. We could wrap `graph_git_behavior()` in a sub-shell, like: graph_git_behavior() { # ... ( cd "$TRASH_DIRECTORY/$DIR" && graph_git_two_modesl ) } , but since we're invoking git directly, we can pass along a "-C $DIR" when "$DIR" is non-empty. Note, however, that until the remaining callers are cleaned up to avoid changing working directories outside of a sub-shell, that we need to ensure that we are operating in the top-level $TRASH_DIRECTORY. The inner-subshell will go away in a future commit once it is no longer necessary. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2023-07-24 16:39:25 +00:00			`# graph_git_behavior <name> <directory> <branch> <compare>`
			`#`
			`# Ensures that a handful of traversal operations produce the same`
			`# results with and without the commit-graph in use.`
			`#`
			`# NOTE: it is a bug to call this function with <directory> containing`
			`# any characters in $IFS.`
t5318: extract helpers to lib-commit-graph.sh The graph_git_behavior helper is useful for testing that certain Git commands behave the same when using the commit-graph and when not using the commit-graph. Extract it to a new lib-commit-graph.sh file for use in new test scripts that will split out from t5318. While doing this extraction, also extract graph_read_expect and the logic for priming the test_oid_cache. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2022-03-01 19:48:29 +00:00			`graph_git_behavior() {`
			`MSG=$1`
			`DIR=$2`
			`BRANCH=$3`
			`COMPARE=$4`
			`test_expect_success "check normal git operations: $MSG" '`
t/lib-commit-graph.sh: avoid sub-shell in `graph_git_behavior()` In a previous commit, we introduced a sub-shell in the implementation of `graph_git_behavior()`, in order to allow us to pass `-C "$DIR"` directly to the git processes spawned by `graph_git_two_modes()`. Now that its callers are always operating from the "$TRASH_DIRECTORY" instead of one of its sub-directories, we can drop the inner sub-shell, as it is no longer required. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2023-07-24 16:39:34 +00:00			`graph_git_two_modes "${DIR:+-C $DIR} log --oneline $BRANCH" &&`
			`graph_git_two_modes "${DIR:+-C $DIR} log --topo-order $BRANCH" &&`
			`graph_git_two_modes "${DIR:+-C $DIR} log --graph $COMPARE..$BRANCH" &&`
			`graph_git_two_modes "${DIR:+-C $DIR} branch -vv" &&`
			`graph_git_two_modes "${DIR:+-C $DIR} merge-base -a $BRANCH $COMPARE"`
t5318: extract helpers to lib-commit-graph.sh The graph_git_behavior helper is useful for testing that certain Git commands behave the same when using the commit-graph and when not using the commit-graph. Extract it to a new lib-commit-graph.sh file for use in new test scripts that will split out from t5318. While doing this extraction, also extract graph_read_expect and the logic for priming the test_oid_cache. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2022-03-01 19:48:29 +00:00			`'`
			`}`

			`graph_read_expect() {`
			`OPTIONAL=""`
			`NUM_CHUNKS=3`
t/lib-commit-graph.sh: allow `graph_read_expect()` in sub-directories The `graph_read_expect()` function is used to ensure that the output of the "read-graph" test helper matches certain parameters (e.g., how many commits are in the graph, which chunks were written, etc.). It expects the Git repository being tested to be at the current working directory. However, a handful of t5318 tests use different repositories stored in sub-directories. To work around this, several tests in t5318 change into the relevant repository outside of a sub-shell, altering the context for the rest of the suite. Prepare to remove these globally-scoped directory changes by teaching `graph_read_expect()` to take an optional "-C dir" to specify where the repository containing the commit-graph being tested is. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2023-07-24 16:39:22 +00:00			`DIR="."`
			`if test "$1" = -C`
			`then`
			`shift`
			`DIR="$1"`
			`shift`
			`fi`
t5318: extract helpers to lib-commit-graph.sh The graph_git_behavior helper is useful for testing that certain Git commands behave the same when using the commit-graph and when not using the commit-graph. Extract it to a new lib-commit-graph.sh file for use in new test scripts that will split out from t5318. While doing this extraction, also extract graph_read_expect and the logic for priming the test_oid_cache. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2022-03-01 19:48:29 +00:00			`if test -n "$2"`
			`then`
			`OPTIONAL=" $2"`
			`NUM_CHUNKS=$((3 + $(echo "$2" \| wc -w)))`
			`fi`
commit-graph: start parsing generation v2 (again) The 'read_generation_data' member of 'struct commit_graph' was introduced by 1fdc383c5 (commit-graph: use generation v2 only if entire chain does, 2021-01-16). The intention was to avoid using corrected commit dates if not all layers of a commit-graph had that data stored. The logic in validate_mixed_generation_chain() at that point incorrectly initialized read_generation_data to 1 if and only if the tip commit-graph contained the Corrected Commit Date chunk. This was "fixed" in 448a39e65 (commit-graph: validate layers for generation data, 2021-02-02) to validate that read_generation_data was either non-zero for all layers, or it would set read_generation_data to zero for all layers. The problem here is that read_generation_data is not initialized to be non-zero anywhere! This change initializes read_generation_data immediately after the chunk is parsed, so each layer will have its value present as soon as possible. The read_generation_data member is used in fill_commit_graph_info() to determine if we should use the corrected commit date or the topological levels stored in the Commit Data chunk. Due to this bug, all previous versions of Git were defaulting to topological levels in all cases! This can be measured with some performance tests. Using the Linux kernel as a testbed, I generated a complete commit-graph containing corrected commit dates and tested the 'new' version against the previous, 'old' version. First, rev-list with --topo-order demonstrates a 26% improvement using corrected commit dates: hyperfine \ -n "old" "$OLD_GIT rev-list --topo-order -1000 v3.6" \ -n "new" "$NEW_GIT rev-list --topo-order -1000 v3.6" \ --warmup=10 Benchmark 1: old Time (mean ± σ): 57.1 ms ± 3.1 ms Range (min … max): 52.9 ms … 62.0 ms 55 runs Benchmark 2: new Time (mean ± σ): 45.5 ms ± 3.3 ms Range (min … max): 39.9 ms … 51.7 ms 59 runs Summary 'new' ran 1.26 ± 0.11 times faster than 'old' These performance improvements are due to the algorithmic improvements given by walking fewer commits due to the higher cutoffs from corrected commit dates. However, this comes at a cost. The additional I/O cost of parsing the corrected commit dates is visible in case of merge-base commands that do not reduce the overall number of walked commits. hyperfine \ -n "old" "$OLD_GIT merge-base v4.8 v4.9" \ -n "new" "$NEW_GIT merge-base v4.8 v4.9" \ --warmup=10 Benchmark 1: old Time (mean ± σ): 110.4 ms ± 6.4 ms Range (min … max): 96.0 ms … 118.3 ms 25 runs Benchmark 2: new Time (mean ± σ): 150.7 ms ± 1.1 ms Range (min … max): 149.3 ms … 153.4 ms 19 runs Summary 'old' ran 1.36 ± 0.08 times faster than 'new' Performance issues like this are what motivated 702110aac (commit-graph: use config to specify generation type, 2021-02-25). In the future, we could fix this performance problem by inserting the corrected commit date offsets into the Commit Date chunk instead of having that data in an extra chunk. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2022-03-01 19:48:31 +00:00			`GENERATION_VERSION=2`
			`if test -n "$3"`
			`then`
			`GENERATION_VERSION=$3`
			`fi`
			`OPTIONS=`
			`if test $GENERATION_VERSION -gt 1`
			`then`
			`OPTIONS=" read_generation_data"`
			`fi`
t/lib-commit-graph.sh: allow `graph_read_expect()` in sub-directories The `graph_read_expect()` function is used to ensure that the output of the "read-graph" test helper matches certain parameters (e.g., how many commits are in the graph, which chunks were written, etc.). It expects the Git repository being tested to be at the current working directory. However, a handful of t5318 tests use different repositories stored in sub-directories. To work around this, several tests in t5318 change into the relevant repository outside of a sub-shell, altering the context for the rest of the suite. Prepare to remove these globally-scoped directory changes by teaching `graph_read_expect()` to take an optional "-C dir" to specify where the repository containing the commit-graph being tested is. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2023-07-24 16:39:22 +00:00			`cat >"$DIR/expect" <<-EOF`
t5318: extract helpers to lib-commit-graph.sh The graph_git_behavior helper is useful for testing that certain Git commands behave the same when using the commit-graph and when not using the commit-graph. Extract it to a new lib-commit-graph.sh file for use in new test scripts that will split out from t5318. While doing this extraction, also extract graph_read_expect and the logic for priming the test_oid_cache. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2022-03-01 19:48:29 +00:00			`header: 43475048 1 $(test_oid oid_version) $NUM_CHUNKS 0`
			`num_commits: $1`
			`chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL`
commit-graph: start parsing generation v2 (again) The 'read_generation_data' member of 'struct commit_graph' was introduced by 1fdc383c5 (commit-graph: use generation v2 only if entire chain does, 2021-01-16). The intention was to avoid using corrected commit dates if not all layers of a commit-graph had that data stored. The logic in validate_mixed_generation_chain() at that point incorrectly initialized read_generation_data to 1 if and only if the tip commit-graph contained the Corrected Commit Date chunk. This was "fixed" in 448a39e65 (commit-graph: validate layers for generation data, 2021-02-02) to validate that read_generation_data was either non-zero for all layers, or it would set read_generation_data to zero for all layers. The problem here is that read_generation_data is not initialized to be non-zero anywhere! This change initializes read_generation_data immediately after the chunk is parsed, so each layer will have its value present as soon as possible. The read_generation_data member is used in fill_commit_graph_info() to determine if we should use the corrected commit date or the topological levels stored in the Commit Data chunk. Due to this bug, all previous versions of Git were defaulting to topological levels in all cases! This can be measured with some performance tests. Using the Linux kernel as a testbed, I generated a complete commit-graph containing corrected commit dates and tested the 'new' version against the previous, 'old' version. First, rev-list with --topo-order demonstrates a 26% improvement using corrected commit dates: hyperfine \ -n "old" "$OLD_GIT rev-list --topo-order -1000 v3.6" \ -n "new" "$NEW_GIT rev-list --topo-order -1000 v3.6" \ --warmup=10 Benchmark 1: old Time (mean ± σ): 57.1 ms ± 3.1 ms Range (min … max): 52.9 ms … 62.0 ms 55 runs Benchmark 2: new Time (mean ± σ): 45.5 ms ± 3.3 ms Range (min … max): 39.9 ms … 51.7 ms 59 runs Summary 'new' ran 1.26 ± 0.11 times faster than 'old' These performance improvements are due to the algorithmic improvements given by walking fewer commits due to the higher cutoffs from corrected commit dates. However, this comes at a cost. The additional I/O cost of parsing the corrected commit dates is visible in case of merge-base commands that do not reduce the overall number of walked commits. hyperfine \ -n "old" "$OLD_GIT merge-base v4.8 v4.9" \ -n "new" "$NEW_GIT merge-base v4.8 v4.9" \ --warmup=10 Benchmark 1: old Time (mean ± σ): 110.4 ms ± 6.4 ms Range (min … max): 96.0 ms … 118.3 ms 25 runs Benchmark 2: new Time (mean ± σ): 150.7 ms ± 1.1 ms Range (min … max): 149.3 ms … 153.4 ms 19 runs Summary 'old' ran 1.36 ± 0.08 times faster than 'new' Performance issues like this are what motivated 702110aac (commit-graph: use config to specify generation type, 2021-02-25). In the future, we could fix this performance problem by inserting the corrected commit date offsets into the Commit Date chunk instead of having that data in an extra chunk. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2022-03-01 19:48:31 +00:00			`options:$OPTIONS`
t5318: extract helpers to lib-commit-graph.sh The graph_git_behavior helper is useful for testing that certain Git commands behave the same when using the commit-graph and when not using the commit-graph. Extract it to a new lib-commit-graph.sh file for use in new test scripts that will split out from t5318. While doing this extraction, also extract graph_read_expect and the logic for priming the test_oid_cache. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2022-03-01 19:48:29 +00:00			`EOF`
t/lib-commit-graph.sh: allow `graph_read_expect()` in sub-directories The `graph_read_expect()` function is used to ensure that the output of the "read-graph" test helper matches certain parameters (e.g., how many commits are in the graph, which chunks were written, etc.). It expects the Git repository being tested to be at the current working directory. However, a handful of t5318 tests use different repositories stored in sub-directories. To work around this, several tests in t5318 change into the relevant repository outside of a sub-shell, altering the context for the rest of the suite. Prepare to remove these globally-scoped directory changes by teaching `graph_read_expect()` to take an optional "-C dir" to specify where the repository containing the commit-graph being tested is. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2023-07-24 16:39:22 +00:00			`(`
			`cd "$DIR" &&`
			`test-tool read-graph >output &&`
			`test_cmp expect output`
			`)`
t5318: extract helpers to lib-commit-graph.sh The graph_git_behavior helper is useful for testing that certain Git commands behave the same when using the commit-graph and when not using the commit-graph. Extract it to a new lib-commit-graph.sh file for use in new test scripts that will split out from t5318. While doing this extraction, also extract graph_read_expect and the logic for priming the test_oid_cache. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2022-03-01 19:48:29 +00:00			`}`