development/git - HydraGit

mirror of https://github.com/git/git synced 2024-08-25 02:35:46 +00:00

Author	SHA1	Message	Date
Junio C Hamano	99c0d94eaa	Merge branch 'hx/lookup-commit-in-graph-fix' A corner case bug where lazily fetching objects from a promisor remote resulted in infinite recursion has been corrected. * hx/lookup-commit-in-graph-fix: t5330: remove run_with_limited_processses() commit-graph.c: no lazy fetch in lookup_commit_in_graph()	2022-07-19 16:40:16 -07:00
Han Xin	3a1ea94a49	commit-graph.c: no lazy fetch in lookup_commit_in_graph() The commit-graph is used to opportunistically optimize accesses to certain pieces of information on commit objects, and lookup_commit_in_graph() tries to say "no" when the requested commit does not locally exist by returning NULL, in which case the caller can ask for (which may result in on-demand fetching from a promisor remote) and parse the commit object itself. However, it uses a wrong helper, repo_has_object_file(), to do so. This helper not only checks if an object is mmediately available in the local object store, but also tries to fetch from a promisor remote. But the fetch machinery calls lookup_commit_in_graph(), thus causing an infinite loop. We should make lookup_commit_in_graph() expect that a commit given to it can be legitimately missing from the local object store, by using the has_object_file() helper instead. Signed-off-by: Han Xin <hanxin.hx@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-06-30 23:00:32 -07:00
Junio C Hamano	a50036da1a	Merge branch 'tb/cruft-packs' A mechanism to pack unreachable objects into a "cruft pack", instead of ejecting them into loose form to be reclaimed later, has been introduced. * tb/cruft-packs: sha1-file.c: don't freshen cruft packs builtin/gc.c: conditionally avoid pruning objects via loose builtin/repack.c: add cruft packs to MIDX during geometric repack builtin/repack.c: use named flags for existing_packs builtin/repack.c: allow configuring cruft pack generation builtin/repack.c: support generating a cruft pack builtin/pack-objects.c: --cruft with expiration reachable: report precise timestamps from objects in cruft packs reachable: add options to add_unseen_recent_objects_to_traversal builtin/pack-objects.c: --cruft without expiration builtin/pack-objects.c: return from create_object_entry() t/helper: add 'pack-mtimes' test-tool pack-mtimes: support writing pack .mtimes files chunk-format.h: extract oid_version() pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles' pack-mtimes: support reading .mtimes files Documentation/technical: add cruft-packs.txt	2022-06-03 14:30:37 -07:00
Taylor Blau	d9fef9d90d	chunk-format.h: extract oid_version() There are three definitions of an identical function which converts `the_hash_algo` into either 1 (for SHA-1) or 2 (for SHA-256). There is a copy of this function for writing both the commit-graph and multi-pack-index file, and another inline definition used to write the .rev header. Consolidate these into a single definition in chunk-format.h. It's not clear that this is the best header to define this function in, but it should do for now. (Worth noting, the .rev caller expects a 4-byte unsigned, but the other two callers work with a single unsigned byte. The consolidated version uses the latter type, and lets the compiler widen it when required). Another caller will be added in a subsequent patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-26 15:48:26 -07:00
Junio C Hamano	1b8138fb08	Merge branch 'ab/valgrind-fixes' A bit of test framework fixes with a few fixes to issues found by valgrind. * ab/valgrind-fixes: commit-graph.c: don't assume that stat() succeeds object-file: fix a unpack_loose_header() regression in `3b6a8db3b0` log test: skip a failing mkstemp() test under valgrind tests: using custom GIT_EXEC_PATH breaks --valgrind tests	2022-05-23 14:39:54 -07:00
Junio C Hamano	538dc459a0	Merge branch 'ep/maint-equals-null-cocci' Introduce and apply coccinelle rule to discourage an explicit comparison between a pointer and NULL, and applies the clean-up to the maintenance track. * ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci	2022-05-20 15:26:59 -07:00
Ævar Arnfjörð Bjarmason	7c898554d7	commit-graph.c: don't assume that stat() succeeds Fix code added in `8d84097f96` (commit-graph: expire commit-graph files, 2019-06-18) to check the return value of the stat() system call. Not doing so caused us to use uninitialized memory in the "Bloom generation is limited by --max-new-filters" test in t4216-log-bloom.sh: + rm -f trace.event + pwd + GIT_TRACE2_EVENT=[...]/t/trash directory.t4216-log-bloom/limits/trace.event git commit-graph write --reachable --split=replace --changed-paths --max-new-filters=2 ==24835== Syscall param utimensat(times[0].tv_sec) points to uninitialised byte(s) ==24835== at 0x499E65A: __utimensat64_helper (utimensat.c:34) ==24835== by 0x4999142: utime (utime.c:36) ==24835== by 0x552BE0: mark_commit_graphs (commit-graph.c:2213) ==24835== by 0x550822: write_commit_graph (commit-graph.c:2424) ==24835== by 0x54E3A0: write_commit_graph_reachable (commit-graph.c:1681) ==24835== by 0x4374BB: graph_write (commit-graph.c:269) ==24835== by 0x436F7D: cmd_commit_graph (commit-graph.c:326) ==24835== by 0x407B9A: run_builtin (git.c:465) ==24835== by 0x406651: handle_builtin (git.c:719) ==24835== by 0x407575: run_argv (git.c:786) ==24835== by 0x406410: cmd_main (git.c:917) ==24835== by 0x511F09: main (common-main.c:56) ==24835== Address 0x1ffeffde70 is on thread 1's stack ==24835== in frame #1, created by utime (utime.c:25) ==24835== Uninitialised value was created by a stack allocation ==24835== at 0x552B50: mark_commit_graphs (commit-graph.c:2201) ==24835== [...] error: last command exited with $?=126 not ok 137 - Bloom generation is limited by --max-new-filters This would happen as we stat'd the non-existing ".git/objects/info/commit-graph" file. Let's fix mark_commit_graphs() to check the stat()'s return value, and while we're at it fix another case added in the same commit to do the same. The caller in expire_commit_graphs() would have been less likely to run into this, as it's operating on files it just got from readdir(), but it could still happen due to a race with e.g. a concurrent "rm -rf" of the commit-graph files. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-12 15:42:26 -07:00
Junio C Hamano	2b0a58d164	Merge branch 'ep/maint-equals-null-cocci' for maint-2.35 * ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci	2022-05-02 10:06:04 -07:00
Junio C Hamano	afe8a9070b	tree-wide: apply equals-null.cocci Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-02 09:50:37 -07:00
Kleber Tarcísio	c0befa0c03	commit-graph: close file before returning NULL There are two reasons that we could return NULL early within load_commit_graph_chain(): 1. The file does not exist, so the file pointer is NULL. 2. The file exists, but is too small to contain a single hash. These were grouped together when the function was first written in `5c84b3396` (commit-graph: load commit-graph chains, 2019-06-18) in order to simplify how the 'chain_name' string is freed. However, the current code leaves a narrow window where the file pointer is not closed when the file exists, but is rejected for being too small. Split out these cases separately to ensure we close the file in this case. Signed-off-by: Kleber Tarcísio <klebertarcisio@yahoo.com.br> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-20 13:56:24 -07:00
Junio C Hamano	eb804cd405	Merge branch 'ns/core-fsyncmethod' Replace core.fsyncObjectFiles with two new configuration variables, core.fsync and core.fsyncMethod. * ns/core-fsyncmethod: core.fsync: documentation and user-friendly aggregate options core.fsync: new option to harden the index core.fsync: add configuration parsing core.fsync: introduce granular fsync control infrastructure core.fsyncmethod: add writeout-only mode wrapper: make inclusion of Windows csprng header tightly scoped	2022-03-25 16:38:24 -07:00
Junio C Hamano	38bbb9e990	Merge branch 'ab/string-list-count-in-size-t' Count string_list items in size_t, not "unsigned int". * ab/string-list-count-in-size-t: string-list API: change "nr" and "alloc" to "size_t" gettext API users: don't explicitly cast ngettext()'s "n"	2022-03-16 17:53:09 -07:00
Junio C Hamano	a54cc523ad	Merge branch 'ds/commit-graph-gen-v2-fixes' Fixes to the way generation number v2 in the commit-graph files are (not) handled. * ds/commit-graph-gen-v2-fixes: commit-graph: declare bankruptcy on GDAT chunks commit-graph: fix generation number v2 overflow values commit-graph: start parsing generation v2 (again) commit-graph: fix ordering bug in generation numbers t5318: extract helpers to lib-commit-graph.sh test-read-graph: include extra post-parse info	2022-03-16 17:53:09 -07:00
Neeraj Singh	020406eaa5	core.fsync: introduce granular fsync control infrastructure This commit introduces the infrastructure for the core.fsync configuration knob. The repository components we want to sync are identified by flags so that we can turn on or off syncing for specific components. If core.fsyncObjectFiles is set and the core.fsync configuration also includes FSYNC_COMPONENT_LOOSE_OBJECT, we will fsync any loose objects. This picks the strictest data integrity behavior if core.fsync and core.fsyncObjectFiles are set to conflicting values. This change introduces the currently unused fsync_component helper, which will be used by a later patch that adds fsyncing to the refs backend. Actual configuration and documentation of the fsync components list are in other patches in the series to separate review of the underlying mechanism from the policy of how it's configured. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-10 15:10:22 -08:00
Ævar Arnfjörð Bjarmason	99d60545f8	string-list API: change "nr" and "alloc" to "size_t" Change the "nr" and "alloc" members of "struct string_list" to use "size_t" instead of "nr". On some platforms the size of an "unsigned int" will be smaller than a "size_t", e.g. a 32 bit unsigned v.s. 64 bit unsigned. As "struct string_list" is a generic API we use in a lot of places this might cause overflows. As one example: code in "refs.c" keeps track of the number of refs with a "size_t", and auxiliary code in builtin/remote.c in get_ref_states() appends those to a "struct string_list". While we're at it split the "nr" and "alloc" in string-list.h across two lines, which is the case for most such struct member declarations (e.g. in "strbuf.h" and "strvec.h"). Changing e.g. "int i" to "size_t i" in run_and_feed_hook() isn't strictly necessary, and there are a lot more cases where we'll use a local "int", "unsigned int" etc. variable derived from the "nr" in the "struct string_list". But in that case as well as add_wrapped_shortlog_msg() in builtin/shortlog.c we need to adjust the printf format referring to "nr" anyway, so let's also change the other variables referring to it. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-07 12:02:04 -08:00
Derrick Stolee	6dbf4b8172	commit-graph: declare bankruptcy on GDAT chunks The Generation Data (GDAT) and Generation Data Overflow (GDOV) chunks store corrected commit date offsets, used for generation number v2. Recent changes have demonstrated that previous versions of Git were incorrectly parsing data from these chunks, but might have also been writing them incorrectly. I asserted [1] that the previous fixes were sufficient because the known reasons for incorrectly writing generation number v2 data relied on parsing the information incorrectly out of a commit-graph file, but the previous versions of Git were not reading the generation number v2 data. However, Patrick demonstrated [2] a case where in split commit-graphs across an alternate boundary (and possibly some other special conditions) it was possible to have a commit-graph that was generated by a previous version of Git have incorrect generation number v2 data which results in errors like the following: commit-graph generation for commit <oid> is 1623273624 < 1623273710 [1] https://lore.kernel.org/git/f50e74f0-9ffa-f4f2-4663-269801495ed3@github.com/ [2] https://lore.kernel.org/git/Yh93vOkt2DkrGPh2@ncase/ Clearly, there is something else going on. The situation is not completely understood, but the errors do not reproduce if the commit-graphs are all generated by a Git version including these recent fixes. If we cannot trust the existing data in the GDAT and GDOV chunks, then we can alter the format to change the chunk IDs for these chunks. This causes the new version of Git to silently ignore the older chunks (and disabling generation number v2 in the process) while writing new commit-graph files with correct data in the GDA2 and GDO2 chunks. Update commit-graph-format.txt including a historical note about these deprecated chunks. Reported-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-07 09:17:03 -08:00
Ævar Arnfjörð Bjarmason	ef3fe21448	lockfile API users: simplify and don't leak "path" Fix a memory leak in code added in `6c622f9f0b` (commit-graph: write commit-graph chains, 2019-06-18). We needed to free the "lock_name" if we encounter errors, and the "graph_name" after we'd run unlink() on it. For the case of write_commit_graph_file() refactoring the code to free the "lock_name" after we were done using the "struct lock_file lk" would have made the control flow more complex. Luckily we can free the "lock_file" right after the hold_lock_file_for_update() call, if it makes use of "path" at all it'll have copied its contents to a "struct strbuf" of its own. While I'm at it let's fix code added in `fb10ca5b54` (sparse-checkout: write using lockfile, 2019-11-21) in write_patterns_and_update() to avoid the same complexity that I thought I needed when I wrote the initial fix for write_commit_graph_file(). We can free the "sparse_filename" right after calling hold_lock_file_for_update(), we don't need to wait until we're exiting the function to do so. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-04 13:24:19 -08:00
Ævar Arnfjörð Bjarmason	51a94d8ffe	commit-graph: stop fill_oids_from_packs() progress on error and free() Fix a bug in fill_oids_from_packs(), we should always stop_progress(), but did not do so if we returned an error here. This also plugs a memory leak in those cases by releasing the two "struct strbuf" variables the function uses. While I'm at it stop hardcoding "-1" here and just use the return value of error() instead, which happens to be "-1". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-04 13:24:19 -08:00
Ævar Arnfjörð Bjarmason	4a0479086a	commit-graph: fix memory leak in misused string_list API When this code was migrated to the string_list API in `d88b14b3fd` (commit-graph: use string-list API for input, 2018-06-27) it was made to use used both STRING_LIST_INIT_NODUP and a strbuf_detach() pattern. Those should not be used together if string_list_clear() is expected to free the memory, instead we need to either use STRING_LIST_INIT_DUP with a string_list_append_nodup(), or a STRING_LIST_INIT_NODUP and manually fiddle with the "strdup_strings" member before calling string_list_clear(). Let's do the former. Since "strdup_strings = 1" is set now other code might be broken by relying on "pack_indexes" not to duplicate it strings, but that doesn't happen. When we pass this down to write_commit_graph() that code uses the "struct string_list" without modifying it. Let's add a "const" to the variable to have the compiler enforce that assumption. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-04 13:24:18 -08:00
Derrick Stolee	c8d67b9a68	commit-graph: fix generation number v2 overflow values The Generation Data Chunk was implemented and tested in `e8b63005c` (commit-graph: implement generation data chunk, 2021-01-16), but the test was carefully constructed to work on systems with 32-bit dates. Since the corrected commit date offsets still required more than 31 bits, this triggered writing the generation_data_overflow chunk. However, upon closer look, the write_graph_chunk_generation_data_overflow() method writes the offsets to the chunk (as dictated by the format) but fill_commit_graph_info() treats the value in the chunk as if it is the full corrected commit date (not an offset). For some reason, this does not cause an issue when using the FUTURE_DATE specified in t5318-commit-graph.sh, but it does show up as a failure in 'git commit-graph verify' if we increase that FUTURE_DATE to be above four billion. Fix this error and create a 64-bit timestamp version of the test so we can test these larger values. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-01 12:15:06 -08:00
Derrick Stolee	3b0199d4c3	commit-graph: start parsing generation v2 (again) The 'read_generation_data' member of 'struct commit_graph' was introduced by `1fdc383c5` (commit-graph: use generation v2 only if entire chain does, 2021-01-16). The intention was to avoid using corrected commit dates if not all layers of a commit-graph had that data stored. The logic in validate_mixed_generation_chain() at that point incorrectly initialized read_generation_data to 1 if and only if the tip commit-graph contained the Corrected Commit Date chunk. This was "fixed" in `448a39e65` (commit-graph: validate layers for generation data, 2021-02-02) to validate that read_generation_data was either non-zero for all layers, or it would set read_generation_data to zero for all layers. The problem here is that read_generation_data is not initialized to be non-zero anywhere! This change initializes read_generation_data immediately after the chunk is parsed, so each layer will have its value present as soon as possible. The read_generation_data member is used in fill_commit_graph_info() to determine if we should use the corrected commit date or the topological levels stored in the Commit Data chunk. Due to this bug, all previous versions of Git were defaulting to topological levels in all cases! This can be measured with some performance tests. Using the Linux kernel as a testbed, I generated a complete commit-graph containing corrected commit dates and tested the 'new' version against the previous, 'old' version. First, rev-list with --topo-order demonstrates a 26% improvement using corrected commit dates: hyperfine \ -n "old" "$OLD_GIT rev-list --topo-order -1000 v3.6" \ -n "new" "$NEW_GIT rev-list --topo-order -1000 v3.6" \ --warmup=10 Benchmark 1: old Time (mean ± σ): 57.1 ms ± 3.1 ms Range (min … max): 52.9 ms … 62.0 ms 55 runs Benchmark 2: new Time (mean ± σ): 45.5 ms ± 3.3 ms Range (min … max): 39.9 ms … 51.7 ms 59 runs Summary 'new' ran 1.26 ± 0.11 times faster than 'old' These performance improvements are due to the algorithmic improvements given by walking fewer commits due to the higher cutoffs from corrected commit dates. However, this comes at a cost. The additional I/O cost of parsing the corrected commit dates is visible in case of merge-base commands that do not reduce the overall number of walked commits. hyperfine \ -n "old" "$OLD_GIT merge-base v4.8 v4.9" \ -n "new" "$NEW_GIT merge-base v4.8 v4.9" \ --warmup=10 Benchmark 1: old Time (mean ± σ): 110.4 ms ± 6.4 ms Range (min … max): 96.0 ms … 118.3 ms 25 runs Benchmark 2: new Time (mean ± σ): 150.7 ms ± 1.1 ms Range (min … max): 149.3 ms … 153.4 ms 19 runs Summary 'old' ran 1.36 ± 0.08 times faster than 'new' Performance issues like this are what motivated `702110aac` (commit-graph: use config to specify generation type, 2021-02-25). In the future, we could fix this performance problem by inserting the corrected commit date offsets into the Commit Date chunk instead of having that data in an extra chunk. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-01 12:15:06 -08:00
Derrick Stolee	75979d9460	commit-graph: fix ordering bug in generation numbers When computing the generation numbers for a commit-graph, we compute the corrected commit dates and then check if their offsets from the actual dates is too large to fit in the 32-bit Generation Data chunk. However, there is a problem with this approach: if we have parsed the generation data from the previous commit-graph, then we continue the loop because the corrected commit date is already computed. This causes an under-count in the number of overflow values. It is incorrect to add an increment to num_generation_data_overflows next to this 'continue' statement, because we might start double-counting commits that are computed because of the depth-first search walk from a commit with an earlier OID. Instead, iterate over the full commit list at the end, checking the offsets to see how many grow beyond the maximum value. Create a new t5328-commit-graph-64-bit-time.sh test script to handle special cases of testing 64-bit timestamps. This helps demonstrate this bug in more cases. It still won't hit all potential cases until the next change, which reenables reading generation numbers. Use the skip_all trick from `0a2bfccb9c` (t0051: use "skip_all" under !MINGW in single-test file, 2022-02-04) to make the output clean when run on a 32-bit system. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-01 12:14:57 -08:00
Lessley Dennington	0803f9c7cd	commit-graph: return if there is no git directory Return early if git directory does not exist. This will protect against test failures in the upcoming change to BUG in prepare_repo_settings if no git directory exists. Signed-off-by: Lessley Dennington <lessleydennington@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-06 09:55:05 -08:00
Junio C Hamano	c042ad5ad5	Merge branch 'js/run-command-close-packs' The run-command API has been updated so that the callers can easily ask the file descriptors open for packfiles to be closed immediately before spawning commands that may trigger auto-gc. * js/run-command-close-packs: Close object store closer to spawning child processes run_auto_maintenance(): implicitly close the object store run-command: offer to close the object store before running run-command: prettify the `RUN_COMMAND_*` flags pull: release packs before fetching commit-graph: when closing the graph, also release the slab	2021-09-20 15:20:45 -07:00
Junio C Hamano	df0c308c1a	Merge branch 'ab/progress-users-adjust-counters' The code to show progress indicator in a few code paths did not cover between 0-100%, which has been corrected. * ab/progress-users-adjust-counters: entry: show finer-grained counter in "Filtering content" progress line commit-graph: fix bogus counter in "Scanning merged commits" progress line	2021-09-20 15:20:41 -07:00
SZEDER Gábor	4011224944	commit-graph: fix bogus counter in "Scanning merged commits" progress line The final value of the counter of the "Scanning merged commits" progress line is always one less than its expected total, e.g.: Scanning merged commits: 83% (5/6), done. This happens because while iterating over an array the loop variable is passed to display_progress() as-is, but while C arrays (and thus the loop variable) start at 0 and end at N-1, the progress counter must end at N. Fix this by passing 'i + 1' to display_progress(), like most other callsites do. There's an RFC series to add a GIT_TEST_CHECK_PROGRESS=1 mode[1] which catches this issue in the 'fetch.writeCommitGraph' and 'fetch.writeCommitGraph with submodules' tests in 't5510-fetch.sh'. The GIT_TEST_CHECK_PROGRESS=1 mode is not part of this series, but future changes to progress.c may add it or similar assertions to catch this and similar bugs elsewhere. 1. https://lore.kernel.org/git/20210620200303.2328957-1-szeder.dev@gmail.com/ Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-09 09:58:19 -07:00
Johannes Schindelin	957ba814bf	commit-graph: when closing the graph, also release the slab The slab has information about the commit graph. That means that it is meaningless (and even misleading) when the commit graph was closed. This seems not to matter currently, but we're about to fix a Windows-specific bug where `git pull` does not close the object store before fetching (risking that an implicit auto-gc fails to remove the now-obsolete pack file(s)), and once we have that bug fix in place, it does matter: after that bug fix, we will open the object store, do some stuff with it, then close it, fetch, and then open it again, and do more stuff. If we close the commit graph without releasing the corresponding slab, we're hit by a symptom like this in t5520.19: BUG: commit-reach.c:85: bad generation skip 9223372036854775807 > 3 at 5cd378271655d43a3b4477520014f02213ad1546 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-08 12:17:14 -07:00
Patrick Steinhardt	f559d6d45e	revision: avoid hitting packfiles when commits are in commit-graph When queueing references in git-rev-list(1), we try to optimize parsing of commits via the commit-graph. To do so, we first look up the object's type, and if it is a commit we call `repo_parse_commit()` instead of `parse_object()`. This is quite inefficient though given that we're always uncompressing the object header in order to determine the type. Instead, we can opportunistically search the commit-graph for the object ID: in case it's found, we know it's a commit and can directly fill in the commit object without having to uncompress the object header. Expose a new function `lookup_commit_in_graph()`, which tries to find a commit in the commit-graph by ID, and convert `get_reference()` to use this function. This provides a big performance win in cases where we load references in a repository with lots of references pointing to commits. The following has been executed in a real-world repository with about 2.2 million refs: Benchmark #1: HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev Time (mean ± σ): 4.458 s ± 0.044 s [User: 4.115 s, System: 0.342 s] Range (min … max): 4.409 s … 4.534 s 10 runs Benchmark #2: HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev Time (mean ± σ): 3.089 s ± 0.015 s [User: 2.768 s, System: 0.321 s] Range (min … max): 3.061 s … 3.105 s 10 runs Summary 'HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' ran 1.44 ± 0.02 times faster than 'HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-08-09 09:51:12 -07:00
Patrick Steinhardt	809ea28f80	commit-graph: split out function to search commit position The function `find_commit_in_graph()` assumes that the caller has passed an object which was already determined to be a commit given that it will access the commit's graph position, which is stored in a commit slab. In a subsequent patch, we want to search for an object ID though without knowing whether it is a commit or not, which is not currently possible. Split out the logic to search the commit graph for a given object ID to prepare for this change. This commit also renames the function to `find_commit_pos_in_graph()`, which more accurately reflects what this function does. Furthermore, in order to allow for the searched object ID to be const, we need to adjust `bsearch_graph()`'s signature to accept a constant object ID as input, too. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-08-09 09:51:12 -07:00
Junio C Hamano	dd6d3c90ee	Merge branch 'ab/attribute-format' Many "printf"-like helper functions we have have been annotated with __attribute__() to catch placeholder/parameter mismatches. * ab/attribute-format: advice.h: add missing __attribute__((format)) & fix usage .h: add a few missing __attribute__((format)) .c static functions: add missing __attribute__((format)) sequencer.c: move static function to avoid forward decl *.c static functions: don't forward-declare __attribute__	2021-07-28 13:17:59 -07:00
Ævar Arnfjörð Bjarmason	48ca53cac4	*.c static functions: add missing __attribute__((format)) Add missing __attribute__((format)) function attributes to various "static" functions that take printf arguments. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-13 15:20:20 -07:00
Taylor Blau	15316a4732	commit-graph: rewrite to use checksum_valid() Rewrite an existing caller in `git commit-graph verify` to take advantage of checksum_valid(). Note that the replacement isn't a verbatim cut-and-paste, since the new function avoids using hashfile at all and instead talks to the_hash_algo directly, but it is functionally equivalent. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-06-28 20:36:17 -07:00
brian m. carlson	72871b132c	commit-graph: don't store file hashes as struct object_id The idea behind struct object_id is that it is supposed to represent the identifier of a standard Git object or a special pseudo-object like the all-zeros object ID. In this case, we have file hashes, which, while similar, are distinct from the identifiers of objects. Switch these code paths to use an unsigned char array. This is both more logically consistent and it means that we need not set the algorithm identifier for the struct object_id. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-27 16:31:39 +09:00
brian m. carlson	92e2cab96b	Always use oidread to read into struct object_id In the future, we'll want oidread to automatically set the hash algorithm member for an object ID we read into it, so ensure we use oidread instead of hashcpy everywhere we're copying a hash value into a struct object_id. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-27 16:31:38 +09:00
Junio C Hamano	d20fa3cf9d	Merge branch 'ds/commit-graph-generation-config' A new configuration variable has been introduced to allow choosing which version of the generation number gets used in the commit-graph file. * ds/commit-graph-generation-config: commit-graph: use config to specify generation type commit-graph: create local repository pointer	2021-03-22 14:00:23 -07:00
René Scharfe	ca56dadb4b	use CALLOC_ARRAY Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-13 16:00:09 -08:00
Junio C Hamano	660dd97a62	Merge branch 'ds/chunked-file-api' The common code to deal with "chunked file format" that is shared by the multi-pack-index and commit-graph files have been factored out, to help codepaths for both filetypes to become more robust. * ds/chunked-file-api: commit-graph.c: display correct number of chunks when writing chunk-format: add technical docs chunk-format: restore duplicate chunk checks midx: use 64-bit multiplication for chunk sizes midx: use chunk-format read API commit-graph: use chunk-format read API chunk-format: create read chunk API midx: use chunk-format API in write_midx_internal() midx: drop chunk progress during write midx: return success/failure in chunk write methods midx: add num_large_offsets to write_midx_context midx: add pack_perm to write_midx_context midx: add entries to write_midx_context midx: use context in write_midx_pack_names() midx: rename pack_info to write_midx_context commit-graph: use chunk-format write API chunk-format: create chunk format write API commit-graph: anonymize data in chunk_write_fn	2021-03-01 14:02:57 -08:00
Junio C Hamano	c0b27e3964	Merge branch 'js/commit-graph-warning' * js/commit-graph-warning: Revert "commit-graph: when incompatible with graphs, indicate why"	2021-03-01 09:21:24 -08:00
Junio C Hamano	cdc986a7c2	Revert "commit-graph: when incompatible with graphs, indicate why" This reverts commit `c85eec7fc3`, as it is a bit overzealous, we are in prerelease freeze, and we want to have enough time to get this right and cook in 'next'. cf. <8735xgkvuo.fsf@evledraar.gmail.com>	2021-03-01 09:19:37 -08:00
Derrick Stolee	702110aac6	commit-graph: use config to specify generation type We have two established generation number versions: 1: topological levels 2: corrected commit dates The corrected commit dates are enabled by default, but they also write extra data in the GDAT and GDOV chunks. Services that host Git data might want to have more control over when this feature rolls out than just updating the Git binaries. Add a new "commitGraph.generationVersion" config option that specifies the intended generation number version. If this value is less than 2, then the GDAT chunk is never written _or read_ from an existing file. This can replace our use of the GIT_TEST_COMMIT_GRAPH_NO_GDAT environment variable in the test suite. Remove it. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-25 15:10:41 -08:00
Derrick Stolee	c7ef8fe608	commit-graph: create local repository pointer The write_commit_graph() method uses 'the_repository' in a few places. A new need for a repository pointer is coming in the following change, so group these instances into a local variable 'r' that could eventually become part of the method signature, if so desired. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-25 15:10:40 -08:00
Taylor Blau	c4ff24bbb3	commit-graph.c: display correct number of chunks when writing When writing a commit-graph, a progress meter is shown which indicates the number of pieces of data to write (one per commit in each chunk). In `47410aa837` (commit-graph: use chunk-format write API, 2021-02-18), the number of chunks became tracked by the new chunk-format API. But a stray local variable was left behind from when write_commit_graph_file() used to keep track of the same. Since this was no longer updated after `47410aa837`, the progress meter appeared broken: $ git commit-graph write --reachable Expanding reachable commits in commit graph: 837569, done. Writing out commit graph in 3 passes: 166% (4187845/2512707), done. Drop the local variable and rely instead on the chunk-format API to tell us the correct number of chunks. Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-24 11:44:34 -08:00
Andrzej Hunt	bf4bb9f9f5	commit-graph: avoid leaking topo_levels slab in write_commit_graph() write_commit_graph initialises topo_levels using init_topo_level_slab(), next it calls compute_topological_levels() which can cause the slab to grow, we therefore need to clear the slab again using clear_topo_level_slab() when we're done. First introduced in `72a2bfca` (commit-graph: add a slab to store topological levels, 2021-01-16). LeakSanitizer output: ==1026==ERROR: LeakSanitizer: detected memory leaks Direct leak of 8 byte(s) in 1 object(s) allocated from: #0 0x498ae9 in realloc /src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3 #1 0xafbed8 in xrealloc /src/git/wrapper.c:126:8 #2 0x7966d1 in topo_level_slab_at_peek /src/git/commit-graph.c:71:1 #3 0x7965e0 in topo_level_slab_at /src/git/commit-graph.c:71:1 #4 0x78fbf5 in compute_topological_levels /src/git/commit-graph.c:1472:12 #5 0x78c5c3 in write_commit_graph /src/git/commit-graph.c:2456:2 #6 0x535c5f in graph_write /src/git/builtin/commit-graph.c:299:6 #7 0x5350ca in cmd_commit_graph /src/git/builtin/commit-graph.c:337:11 #8 0x4cddb1 in run_builtin /src/git/git.c:453:11 #9 0x4cabe2 in handle_builtin /src/git/git.c:704:3 #10 0x4cd084 in run_argv /src/git/git.c:771:4 #11 0x4ca424 in cmd_main /src/git/git.c:902:19 #12 0x707fb6 in main /src/git/common-main.c:52:11 #13 0x7fee4249383f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2083f) Indirect leak of 524256 byte(s) in 1 object(s) allocated from: #0 0x498942 in calloc /src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3 #1 0xafc088 in xcalloc /src/git/wrapper.c:140:8 #2 0x796870 in topo_level_slab_at_peek /src/git/commit-graph.c:71:1 #3 0x7965e0 in topo_level_slab_at /src/git/commit-graph.c:71:1 #4 0x78fbf5 in compute_topological_levels /src/git/commit-graph.c:1472:12 #5 0x78c5c3 in write_commit_graph /src/git/commit-graph.c:2456:2 #6 0x535c5f in graph_write /src/git/builtin/commit-graph.c:299:6 #7 0x5350ca in cmd_commit_graph /src/git/builtin/commit-graph.c:337:11 #8 0x4cddb1 in run_builtin /src/git/git.c:453:11 #9 0x4cabe2 in handle_builtin /src/git/git.c:704:3 #10 0x4cd084 in run_argv /src/git/git.c:771:4 #11 0x4ca424 in cmd_main /src/git/git.c:902:19 #12 0x707fb6 in main /src/git/common-main.c:52:11 #13 0x7fee4249383f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2083f) SUMMARY: AddressSanitizer: 524264 byte(s) leaked in 2 allocation(s). Signed-off-by: Andrzej Hunt <ajrhunt@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-22 13:45:01 -08:00
Derrick Stolee	2692c2f6fd	commit-graph: use chunk-format read API Instead of parsing the table of contents directly, use the chunk-format API methods read_table_of_contents() and pair_chunk(). While the current implementation loses the duplicate-chunk detection, that will be added in a future change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-18 13:38:16 -08:00
Derrick Stolee	47410aa837	commit-graph: use chunk-format write API The commit-graph write logic is ready to make use of the chunk-format write API. Each chunk write method is already in the correct prototype. We only need to use the 'struct chunkfile' pointer and the correct API calls. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-18 13:38:16 -08:00
Junio C Hamano	726b11d68a	Merge branch 'js/commit-graph-warning' When certain features (e.g. grafts) used in the repository are incompatible with the use of the commit-graph, we used to silently turned commit-graph off; we now tell the user what we are doing. * js/commit-graph-warning: commit-graph: when incompatible with graphs, indicate why	2021-02-17 17:21:42 -08:00
Junio C Hamano	5bd0b21bf7	Merge branch 'ds/commit-graph-genno-fix' Fix incremental update of commit-graph file around corrected commit date data. * ds/commit-graph-genno-fix: commit-graph: prepare commit graph commit-graph: be extra careful about mixed generations commit-graph: compute generations separately commit-graph: validate layers for generation data commit-graph: always parse before commit_graph_data_at() commit-graph: use repo_parse_commit	2021-02-17 17:21:40 -08:00
Junio C Hamano	8b4701ae4f	Merge branch 'ak/corrected-commit-date' The commit-graph learned to use corrected commit dates instead of the generation number to help topological revision traversal. * ak/corrected-commit-date: doc: add corrected commit date info commit-reach: use corrected commit dates in paint_down_to_common() commit-graph: use generation v2 only if entire chain does commit-graph: implement generation data chunk commit-graph: implement corrected commit date commit-graph: return 64-bit generation number commit-graph: add a slab to store topological levels t6600-test-reach: generalize *_three_modes commit-graph: consolidate fill_commit_graph_info revision: parse parent in indegree_walk_step() commit-graph: fix regression when computing Bloom filters	2021-02-17 17:21:40 -08:00
Johannes Schindelin	c85eec7fc3	commit-graph: when incompatible with graphs, indicate why When `gc.writeCommitGraph = true`, it is possible that the commit-graph is _still_ not written: replace objects, grafts and shallow repositories are incompatible with the commit-graph feature. Under such circumstances, we need to indicate to the user why the commit-graph was not written instead of staying silent about it. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-11 09:33:01 -08:00
Junio C Hamano	9d5b1c06ac	Merge branch 'jk/use-oid-pos' Code clean-up to ensure our use of hashtables using object names as keys use the "struct object_id" objects, not the raw hash values. * jk/use-oid-pos: oid_pos(): access table through const pointers hash_pos(): convert to oid_pos() rerere: use strmap to store rerere directories rerere: tighten rr-cache dirname check rerere: check dirname format while iterating rr_cache directory commit_graft_pos(): take an oid instead of a bare hash	2021-02-10 14:48:31 -08:00

1 2 3 4 5 ...

314 commits