git/t/helper/test-read-graph.c

#include "test-tool.h"
#include "commit-graph.h"
#include "repository.h"
#include "object-store-ll.h"
#include "bloom.h"
#include "setup.h"

int cmd__read_graph(int argc UNUSED, const char **argv UNUSED)
{
	struct commit_graph *graph = NULL;
	struct object_directory *odb;

	setup_git_directory();
	odb = the_repository->objects->odb;

	prepare_repo_settings(the_repository);

	graph = read_commit_graph_one(the_repository, odb);
	if (!graph)
		return 1;

	printf("header: %08x %d %d %d %d\n",
		ntohl(*(uint32_t*)graph->data),
		*(unsigned char*)(graph->data + 4),
		*(unsigned char*)(graph->data + 5),
		*(unsigned char*)(graph->data + 6),
		*(unsigned char*)(graph->data + 7));
	printf("num_commits: %u\n", graph->num_commits);
	printf("chunks:");

	if (graph->chunk_oid_fanout)
		printf(" oid_fanout");
	if (graph->chunk_oid_lookup)
		printf(" oid_lookup");
	if (graph->chunk_commit_data)
		printf(" commit_metadata");
	if (graph->chunk_generation_data)
		printf(" generation_data");
	if (graph->chunk_generation_data_overflow)
		printf(" generation_data_overflow");
	if (graph->chunk_extra_edges)
		printf(" extra_edges");
	if (graph->chunk_bloom_indexes)
		printf(" bloom_indexes");
	if (graph->chunk_bloom_data)
		printf(" bloom_data");
	printf("\n");

	printf("options:");
	if (graph->bloom_filter_settings)
		printf(" bloom(%"PRIu32",%"PRIu32",%"PRIu32")",
		       graph->bloom_filter_settings->hash_version,
		       graph->bloom_filter_settings->bits_per_entry,
		       graph->bloom_filter_settings->num_hashes);
	if (graph->read_generation_data)
		printf(" read_generation_data");
	if (graph->topo_levels)
		printf(" topo_levels");
	printf("\n");

	UNLEAK(graph);

	return 0;
}
test-tool: use 'read-graph' helper The 'git commit-graph read' subcommand is used in test scripts to check that the commit-graph contents match the expected data. Mostly, this helps check the header information and the list of chunks. Users do not need this information, so move the functionality to a test helper. Reported-by: Bryan Turner <bturner@atlassian.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2019-11-12 16:58:20 +00:00			`#include "test-tool.h"`
			`#include "commit-graph.h"`
			`#include "repository.h"`
object-store-ll.h: split this header out of object-store.h The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store \| sort \| uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2023-05-16 06:34:06 +00:00			`#include "object-store-ll.h"`
test-read-graph: include extra post-parse info It can be helpful to verify that the 'struct commit_graph' that results from parsing a commit-graph is correctly structured. The existence of different chunks is not enough to verify that all of the optional features are correctly enabled. Update 'test-tool read-graph' to output an "options:" line that includes information for different parts of the struct commit_graph. In particular, this change demonstrates that the read_generation_data option is never being enabled, which will be fixed in a later change. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2022-03-01 19:48:28 +00:00			`#include "bloom.h"`
setup.h: move declarations for setup.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2023-03-21 06:26:05 +00:00			`#include "setup.h"`
test-tool: use 'read-graph' helper The 'git commit-graph read' subcommand is used in test scripts to check that the commit-graph contents match the expected data. Mostly, this helps check the header information and the list of chunks. Users do not need this information, so move the functionality to a test helper. Reported-by: Bryan Turner <bturner@atlassian.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2019-11-12 16:58:20 +00:00
t/helper: mark unused argv/argc arguments Many test helper programs do not bother to look at argc or argv, because they don't take any options. In a user-facing program, it's a good idea to check for unexpected arguments and complain. But for a test helper, it's not worth the trouble to enforce this. But we do want to tell the compiler we're OK with ignoring them, to silence -Wunused-parameter (and obviously we can't get rid of them, since we have to conform to the usual cmd__foo() interface). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2023-03-28 20:57:25 +00:00			`int cmd__read_graph(int argc UNUSED, const char **argv UNUSED)`
test-tool: use 'read-graph' helper The 'git commit-graph read' subcommand is used in test scripts to check that the commit-graph contents match the expected data. Mostly, this helps check the header information and the list of chunks. Users do not need this information, so move the functionality to a test helper. Reported-by: Bryan Turner <bturner@atlassian.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2019-11-12 16:58:20 +00:00			`{`
			`struct commit_graph *graph = NULL;`
commit-graph.c: remove path normalization, comparison As of the previous patch, all calls to 'commit-graph.c' functions which perform path normalization (for e.g., 'get_commit_graph_filename()') are of the form 'ctx->odb->path', which is always in normalized form. Now that there are no callers passing non-normalized paths to these functions, ensure that future callers are bound by the same restrictions by making these functions take a 'struct object_directory ' instead of a 'const char '. To match, replace all calls with arguments of the form 'ctx->odb->path' with 'ctx->odb' To recover the path, functions that perform path manipulation simply use 'odb->path'. Further, avoid string comparisons with arguments of the form 'odb->path', and instead prefer raw pointer comparisons, which accomplish the same effect, but are far less brittle. This has a pleasant side-effect of making these functions much more robust to paths that cannot be normalized by 'normalize_path_copy()', i.e., because they are outside of the current working directory. For example, prior to this patch, Valgrind reports that the following uninitialized memory read [1]: $ ( cd t && GIT_DIR=../.git valgrind git rev-parse HEAD^ ) because 'normalize_path_copy()' can't normalize '../.git' (since it's relative to but above of the current working directory) [2]. By using a 'struct object_directory *' directly, 'get_commit_graph_filename()' does not need to normalize, because all paths are relative to the current working directory since they are always read from the '->path' of an object directory. [1]: https://lore.kernel.org/git/20191027042116.GA5801@sigill.intra.peff.net. [2]: The bug here is that 'get_commit_graph_filename()' returns the result of 'normalize_path_copy()' without checking the return value. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2020-02-03 21:18:02 +00:00			`struct object_directory *odb;`
test-tool: use 'read-graph' helper The 'git commit-graph read' subcommand is used in test scripts to check that the commit-graph contents match the expected data. Mostly, this helps check the header information and the list of chunks. Users do not need this information, so move the functionality to a test helper. Reported-by: Bryan Turner <bturner@atlassian.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2019-11-12 16:58:20 +00:00
			`setup_git_directory();`
commit-graph.c: remove path normalization, comparison As of the previous patch, all calls to 'commit-graph.c' functions which perform path normalization (for e.g., 'get_commit_graph_filename()') are of the form 'ctx->odb->path', which is always in normalized form. Now that there are no callers passing non-normalized paths to these functions, ensure that future callers are bound by the same restrictions by making these functions take a 'struct object_directory ' instead of a 'const char '. To match, replace all calls with arguments of the form 'ctx->odb->path' with 'ctx->odb' To recover the path, functions that perform path manipulation simply use 'odb->path'. Further, avoid string comparisons with arguments of the form 'odb->path', and instead prefer raw pointer comparisons, which accomplish the same effect, but are far less brittle. This has a pleasant side-effect of making these functions much more robust to paths that cannot be normalized by 'normalize_path_copy()', i.e., because they are outside of the current working directory. For example, prior to this patch, Valgrind reports that the following uninitialized memory read [1]: $ ( cd t && GIT_DIR=../.git valgrind git rev-parse HEAD^ ) because 'normalize_path_copy()' can't normalize '../.git' (since it's relative to but above of the current working directory) [2]. By using a 'struct object_directory *' directly, 'get_commit_graph_filename()' does not need to normalize, because all paths are relative to the current working directory since they are always read from the '->path' of an object directory. [1]: https://lore.kernel.org/git/20191027042116.GA5801@sigill.intra.peff.net. [2]: The bug here is that 'get_commit_graph_filename()' returns the result of 'normalize_path_copy()' without checking the return value. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2020-02-03 21:18:02 +00:00			`odb = the_repository->objects->odb;`
test-tool: use 'read-graph' helper The 'git commit-graph read' subcommand is used in test scripts to check that the commit-graph contents match the expected data. Mostly, this helps check the header information and the list of chunks. Users do not need this information, so move the functionality to a test helper. Reported-by: Bryan Turner <bturner@atlassian.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2019-11-12 16:58:20 +00:00
t/helper/test-read-graph.c: prepare repo settings The read-graph test-tool is used by a number of the commit-graph test to assert various properties about a commit-graph. Previously, this program never ran 'prepare_repo_settings()'. There was no need to do so, since none of the commit-graph machinery is affected by the repo settings. In the next patch, the commit-graph machinery's behavior will become dependent on the repo settings, and so loading them before running the rest of the test tool is critical. As such, teach the test tool to call 'prepare_repo_settings()'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2020-09-09 15:23:03 +00:00			`prepare_repo_settings(the_repository);`

t/helper/test-read-graph.c: support commit-graph chains In 61df89c8e5 (commit-graph: don't early exit(1) on e.g. "git status", 2019-03-25), the former 'load_commit_graph_one' was refactored into 'open_commit_graph' and 'load_commit_graph_one_fd_st' as a means of avoiding an early-exit from non-library code. However, 'load_commit_graph_one' does not support commit-graph chains, and hence the 'read-graph' test tool does not work with them. Replace 'load_commit_graph_one' with 'read_commit_graph_one' in order to support commit-graph chains. In the spirit of 61df89c8e5, 'read_commit_graph_one' does not ever 'die()', making it a suitable replacement here. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2020-04-14 04:04:04 +00:00			`graph = read_commit_graph_one(the_repository, odb);`
test-tool: use 'read-graph' helper The 'git commit-graph read' subcommand is used in test scripts to check that the commit-graph contents match the expected data. Mostly, this helps check the header information and the list of chunks. Users do not need this information, so move the functionality to a test helper. Reported-by: Bryan Turner <bturner@atlassian.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2019-11-12 16:58:20 +00:00			`if (!graph)`
			`return 1;`

			`printf("header: %08x %d %d %d %d\n",`
			`ntohl((uint32_t)graph->data),`
			`(unsigned char)(graph->data + 4),`
			`(unsigned char)(graph->data + 5),`
			`(unsigned char)(graph->data + 6),`
			`(unsigned char)(graph->data + 7));`
			`printf("num_commits: %u\n", graph->num_commits);`
			`printf("chunks:");`

			`if (graph->chunk_oid_fanout)`
			`printf(" oid_fanout");`
			`if (graph->chunk_oid_lookup)`
			`printf(" oid_lookup");`
			`if (graph->chunk_commit_data)`
			`printf(" commit_metadata");`
commit-graph: implement generation data chunk As discovered by Ævar, we cannot increment graph version to distinguish between generation numbers v1 and v2 [1]. Thus, one of pre-requistes before implementing generation number v2 was to distinguish between graph versions in a backwards compatible manner. We are going to introduce a new chunk called Generation DATa chunk (or GDAT). GDAT will store corrected committer date offsets whereas CDAT will still store topological level. Old Git does not understand GDAT chunk and would ignore it, reading topological levels from CDAT. New Git can parse GDAT and take advantage of newer generation numbers, falling back to topological levels when GDAT chunk is missing (as it would happen with a commit-graph written by old Git). We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT' which forces commit-graph file to be written without generation data chunk to emulate a commit-graph file written by old Git. To minimize the space required to store corrrected commit date, Git stores corrected commit date offsets into the commit-graph file, instea of corrected commit dates. This saves us 4 bytes per commit, decreasing the GDAT chunk size by half, but it's possible for the offset to overflow the 4-bytes allocated for storage. As such overflows are and should be exceedingly rare, we use the following overflow management scheme: We introduce a new commit-graph chunk, Generation Data OVerflow ('GDOV') to store corrected commit dates for commits with offsets greater than GENERATION_NUMBER_V2_OFFSET_MAX. If the offset is greater than GENERATION_NUMBER_V2_OFFSET_MAX, we set the MSB of the offset and the other bits store the position of corrected commit date in GDOV chunk, similar to how Extra Edge List is maintained. We test the overflow-related code with the following repo history: F - N - U / \ U - N - U N \ / N - F - N Where the commits denoted by U have committer date of zero seconds since Unix epoch, the commits denoted by N have committer date of 1112354055 (default committer date for the test suite) seconds since Unix epoch and the commits denoted by F have committer date of (2 ^ 31 - 2) seconds since Unix epoch. The largest offset observed is 2 ^ 31, just large enough to overflow. [1]: https://lore.kernel.org/git/87a7gdspo4.fsf@evledraar.gmail.com/ Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2021-01-16 18:11:15 +00:00			`if (graph->chunk_generation_data)`
			`printf(" generation_data");`
			`if (graph->chunk_generation_data_overflow)`
			`printf(" generation_data_overflow");`
test-tool: use 'read-graph' helper The 'git commit-graph read' subcommand is used in test scripts to check that the commit-graph contents match the expected data. Mostly, this helps check the header information and the list of chunks. Users do not need this information, so move the functionality to a test helper. Reported-by: Bryan Turner <bturner@atlassian.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2019-11-12 16:58:20 +00:00			`if (graph->chunk_extra_edges)`
			`printf(" extra_edges");`
t4216: add end to end tests for git log with Bloom filters These tests exercises writing commit graph with Bloom filters and exercises 'git log -- path' with all the applicable options. They check that the output is the same with and without Bloom filters, confirm Bloom filters were used by checking if trace2 statistics were logged correctly. Also confirms cases where Bloom filters are not used: 1. Multiple path specs, 2. --walk-reflogs (see patch titled 'revision.c: use Bloom filters...' for details, 3. If the latest commit graph does not have Bloom filters Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2020-04-06 16:59:54 +00:00			`if (graph->chunk_bloom_indexes)`
			`printf(" bloom_indexes");`
			`if (graph->chunk_bloom_data)`
			`printf(" bloom_data");`
test-tool: use 'read-graph' helper The 'git commit-graph read' subcommand is used in test scripts to check that the commit-graph contents match the expected data. Mostly, this helps check the header information and the list of chunks. Users do not need this information, so move the functionality to a test helper. Reported-by: Bryan Turner <bturner@atlassian.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2019-11-12 16:58:20 +00:00			`printf("\n");`

test-read-graph: include extra post-parse info It can be helpful to verify that the 'struct commit_graph' that results from parsing a commit-graph is correctly structured. The existence of different chunks is not enough to verify that all of the optional features are correctly enabled. Update 'test-tool read-graph' to output an "options:" line that includes information for different parts of the struct commit_graph. In particular, this change demonstrates that the read_generation_data option is never being enabled, which will be fixed in a later change. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2022-03-01 19:48:28 +00:00			`printf("options:");`
			`if (graph->bloom_filter_settings)`
			`printf(" bloom(%"PRIu32",%"PRIu32",%"PRIu32")",`
			`graph->bloom_filter_settings->hash_version,`
			`graph->bloom_filter_settings->bits_per_entry,`
			`graph->bloom_filter_settings->num_hashes);`
			`if (graph->read_generation_data)`
			`printf(" read_generation_data");`
			`if (graph->topo_levels)`
			`printf(" topo_levels");`
			`printf("\n");`

test-tool: use 'read-graph' helper The 'git commit-graph read' subcommand is used in test scripts to check that the commit-graph contents match the expected data. Mostly, this helps check the header information and the list of chunks. Users do not need this information, so move the functionality to a test helper. Reported-by: Bryan Turner <bturner@atlassian.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2019-11-12 16:58:20 +00:00			`UNLEAK(graph);`

			`return 0;`
			`}`