development/git - HydraGit

mirror of https://github.com/git/git synced 2024-10-30 14:03:28 +00:00

Author	SHA1	Message	Date
Eric W. Biederman	15a1ca1abe	repository: add a compatibility hash algorithm We currently have support for using a full stage 4 SHA-256 implementation. However, we'd like to support interoperability with SHA-1 repositories as well. The transition plan anticipates a compatibility hash algorithm configuration option that we can use to implement support for this. Let's add an element to the repository structure that indicates the compatibility hash algorithm so we can use it when we need to consider interoperability between algorithms. Add a helper function repo_set_compat_hash_algo that takes a compatibility hash algorithm and sets "repo->compat_hash_algo". If GIT_HASH_UNKNOWN is passed as the compatibility hash algorithm "repo->compat_hash_algo" is set to NULL. For now, the code results in "repo->compat_hash_algo" always being set to NULL, but that will change once a configuration option is added. Inspired-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-10-02 14:57:38 -07:00
Junio C Hamano	a1264a08a1	Merge branch 'en/header-split-cache-h-part-3' Header files cleanup. * en/header-split-cache-h-part-3: (28 commits) fsmonitor-ll.h: split this header out of fsmonitor.h hash-ll, hashmap: move oidhash() to hash-ll object-store-ll.h: split this header out of object-store.h khash: name the structs that khash declares merge-ll: rename from ll-merge git-compat-util.h: remove unneccessary include of wildmatch.h builtin.h: remove unneccessary includes list-objects-filter-options.h: remove unneccessary include diff.h: remove unnecessary include of oidset.h repository: remove unnecessary include of path.h log-tree: replace include of revision.h with simple forward declaration cache.h: remove this no-longer-used header read-cache*.h: move declarations for read-cache.c functions from cache.h repository.h: move declaration of the_index from cache.h merge.h: move declarations for merge.c from cache.h diff.h: move declaration for global in diff.c from cache.h preload-index.h: move declarations for preload-index.c from elsewhere sparse-index.h: move declarations for sparse-index.c from cache.h name-hash.h: move declarations for name-hash.c from cache.h run-command.h: move declarations for run-command.c from cache.h ...	2023-06-29 16:43:21 -07:00
Junio C Hamano	d9f9f6b358	Merge branch 'ds/disable-replace-refs' Introduce a mechanism to disable replace refs globally and per repository. * ds/disable-replace-refs: repository: create read_replace_refs setting replace-objects: create wrapper around setting repository: create disable_replace_refs()	2023-06-22 16:29:06 -07:00
Junio C Hamano	f2ffc74186	Merge branch 'tb/pack-bitmap-traversal-with-boundary' The object traversal using reachability bitmap done by "pack-object" has been tweaked to take advantage of the fact that using "boundary" commits as representative of all the uninteresting ones can save quite a lot of object enumeration. * tb/pack-bitmap-traversal-with-boundary: pack-bitmap.c: use commit boundary during bitmap traversal pack-bitmap.c: extract `fill_in_bitmap()` object: add object_array initializer helper function	2023-06-22 16:29:05 -07:00
Elijah Newren	c339932bd8	repository: remove unnecessary include of path.h This also made it clear that several .c files that depended upon path.h were missing a #include for it; add the missing includes while at it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-21 13:39:53 -07:00
Elijah Newren	bc47f16db2	repository.h: move declaration of the_index from cache.h the_index is a global variable defined in repository.c; as such, its declaration feels better suited living in repository.h rather than cache.h. Move it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-21 13:39:53 -07:00
Elijah Newren	fbffdfb11c	preload-index.h: move declarations for preload-index.c from elsewhere We already have a preload-index.c file; move the declarations for the functions in that file into a new preload-index.h. These were previously split between cache.h and repository.h. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-21 13:39:53 -07:00
Derrick Stolee	9c7d1b057f	repository: create read_replace_refs setting The 'read_replace_refs' global specifies whether or not we should respect the references of the form 'refs/replace/<oid>' to replace which object we look up when asking for '<oid>'. This global has caused issues when it is not initialized properly, such as in `b6551feadf` (merge-tree: load default git config, 2023-05-10). To make this more robust, move its config-based initialization out of git_default_config and into prepare_repo_settings(). This provides a repository-scoped version of the 'read_replace_refs' global. The global still has its purpose: it is disabled process-wide by the GIT_NO_REPLACE_OBJECTS environment variable or by a call to disable_replace_refs() in some specific Git commands. Since we already encapsulated the use of the constant inside replace_refs_enabled(), we can perform the initialization inside that method, if necessary. This solves the problem of forgetting to check the config, as we will check it before returning this value. Due to this encapsulation, the global can move to be static within replace-object.c. There is an interesting behavior change possible here: we now have a repository-scoped understanding of this config value. Thus, if there was a command that recurses into submodules and might follow replace refs, then it would now respect the core.useReplaceRefs config value in each repository. 'git grep --recurse-submodules' is such a command that recurses into submodules in-process. We can demonstrate the granularity of this config value via a test in t7814. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-12 13:34:55 -07:00
Victoria Dye	3867f6d650	repository: move 'repository_format_worktree_config' to repo scope Move 'repository_format_worktree_config' out of the global scope and into the 'repository' struct. This change is similar to how 'repository_format_partial_clone' was moved in `ebaf3bcf1a` (repository: move global r_f_p_c to repo struct, 2021-06-17), adding it to the 'repository' struct and updating 'setup.c' & 'repository.c' functions to assign the value appropriately. The primary goal of this change is to be able to load the worktree config of a submodule depending on whether that submodule - not its superproject - has 'extensions.worktreeConfig' enabled. To ensure 'do_git_config_sequence()' has access to the newly repo-scoped configuration, add a 'struct repository' argument to 'do_git_config_sequence()' and pass it the 'repo' value from 'config_with_options()'. Finally, add/update tests in 't3007-ls-files-recurse-submodules.sh' to verify 'extensions.worktreeConfig' is read an used independently by superprojects and submodules. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-05-26 13:53:41 +09:00
Taylor Blau	b0afdce5da	pack-bitmap.c: use commit boundary during bitmap traversal When reachability bitmap coverage exists in a repository, Git will use a different (and hopefully faster) traversal to compute revision walks. Consider a set of positive and negative tips (which we'll refer to with their standard bitmap parlance by "wants", and "haves"). In order to figure out what objects exist between the tips, the existing traversal in `prepare_bitmap_walk()` does something like: 1. Consider if we can even compute the set of objects with bitmaps, and fall back to the usual traversal if we cannot. For example, pathspec limiting traversals can't be computed using bitmaps (since they don't know which objects are at which paths). The same is true of certain kinds of non-trivial object filters. 2. If we can compute the traversal with bitmaps, partition the (dereferenced) tips into two object lists, "haves", and "wants", based on whether or not the objects have the UNINTERESTING flag, respectively. 3. Fall back to the ordinary object traversal if either (a) there are more than zero haves, none of which are in the bitmapped pack or MIDX, or (b) there are no wants. 4. Construct a reachability bitmap for the "haves" side by walking from the revision tips down to any existing bitmaps, OR-ing in any bitmaps as they are found. 5. Then do the same for the "wants" side, stopping at any objects that appear in the "haves" bitmap. 6. Filter the results if any object filter (that can be easily computed with bitmaps alone) was given, and then return back to the caller. When there is good bitmap coverage relative to the traversal tips, this walk is often significantly faster than an ordinary object traversal because it can visit far fewer objects. But in certain cases, it can be significantly slower than the usual object traversal. Why? Because we need to compute complete bitmaps on either side of the walk. If either one (or both) of the sides require walking many (or all!) objects before they get to an existing bitmap, the extra bitmap machinery is mostly or all overhead. One of the benefits, however, is that even if the walk is slower, bitmap traversals are guaranteed to provide an exact answer. Unlike the traditional object traversal algorithm, which can over-count the results by not opening trees for older commits, the bitmap walk builds an exact reachability bitmap for either side, meaning the results are never over-counted. But producing non-exact results is OK for our traversal here (both in the bitmap case and not), as long as the results are over-counted, not under. Relaxing the bitmap traversal to allow it to produce over-counted results gives us the opportunity to make some significant improvements. Instead of the above, the new algorithm only has to walk from the boundary down to the nearest bitmap, instead of from each of the UNINTERESTING tips. The boundary-based approach still has degenerate cases, but we'll show in a moment that it is often a significant improvement. The new algorithm works as follows: 1. Build a (partial) bitmap of the haves side by first OR-ing any bitmap(s) that already exist for UNINTERESTING commits between the haves and the boundary. 2. For each commit along the boundary, add it as a fill-in traversal tip (where the traversal terminates once an existing bitmap is found), and perform fill-in traversal. 3. Build up a complete bitmap of the wants side as usual, stopping any time we intersect the (partial) haves side. 4. Return the results. And is more-or-less equivalent to using the old algorithm with this invocation: $ git rev-list --objects --use-bitmap-index $WANTS --not \ $(git rev-list --objects --boundary $WANTS --not $HAVES \| perl -lne 'print $1 if /^-(.)/') The new result performs significantly better in many cases, particularly when the distance from the boundary commit(s) to an existing bitmap is shorter than the distance from (all of) the have tips to the nearest bitmapped commit. Note that when using the old bitmap traversal algorithm, the results can be slower* than without bitmaps! Under the new algorithm, the result is computed faster with bitmaps than without (at the cost of over-counting the true number of objects in a similar fashion as the non-bitmap traversal): # (Computing the number of tagged objects not on any branches # without bitmaps). $ time git rev-list --count --objects --tags --not --branches 20 real 0m1.388s user 0m1.092s sys 0m0.296s # (Computing the same query using the old bitmap traversal). $ time git rev-list --count --objects --tags --not --branches --use-bitmap-index 19 real 0m22.709s user 0m21.628s sys 0m1.076s # (this commit) $ time git.compile rev-list --count --objects --tags --not --branches --use-bitmap-index 19 real 0m1.518s user 0m1.234s sys 0m0.284s The new algorithm is still slower than not using bitmaps at all, but it is nearly a 15-fold improvement over the existing traversal. In a more realistic setting (using my local copy of git.git), I can observe a similar (if more modest) speed-up: $ argv="--count --objects --branches --not --tags" hyperfine \ -n 'no bitmaps' "git.compile rev-list $argv" \ -n 'existing traversal' "git.compile rev-list --use-bitmap-index $argv" \ -n 'boundary traversal' "git.compile -c pack.useBitmapBoundaryTraversal=true rev-list --use-bitmap-index $argv" Benchmark 1: no bitmaps Time (mean ± σ): 124.6 ms ± 2.1 ms [User: 103.7 ms, System: 20.8 ms] Range (min … max): 122.6 ms … 133.1 ms 22 runs Benchmark 2: existing traversal Time (mean ± σ): 368.6 ms ± 3.0 ms [User: 325.3 ms, System: 43.1 ms] Range (min … max): 365.1 ms … 374.8 ms 10 runs Benchmark 3: boundary traversal Time (mean ± σ): 167.6 ms ± 0.9 ms [User: 139.5 ms, System: 27.9 ms] Range (min … max): 166.1 ms … 169.2 ms 17 runs Summary 'no bitmaps' ran 1.34 ± 0.02 times faster than 'boundary traversal' 2.96 ± 0.05 times faster than 'existing traversal' Here, the new algorithm is also still slower than not using bitmaps, but represents a more than 2-fold improvement over the existing traversal in a more modest example. Since this algorithm was originally written (nearly a year and a half ago, at the time of writing), the bitmap lookup table shipped, making the new algorithm's result more competitive. A few other future directions for improving bitmap traversal times beyond not using bitmaps at all: - Decrease the cost to decompress and OR together many bitmaps together (particularly when enumerating the uninteresting side of the walk). Here we could explore more efficient bitmap storage techniques, like Roaring+Run and/or use SIMD instructions to speed up ORing them together. - Store pseudo-merge bitmaps, which could allow us to OR together fewer "summary" bitmaps (which would also help with the above). Helped-by: Jeff King <peff@peff.net> Helped-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-05-08 12:05:55 -07:00
Junio C Hamano	fc23c397c7	Merge branch 'tb/enable-cruft-packs-by-default' When "gc" needs to retain unreachable objects, packing them into cruft packs (instead of exploding them into loose object files) has been offered as a more efficient option for some time. Now the use of cruft packs has been made the default and no longer considered an experimental feature. * tb/enable-cruft-packs-by-default: repository.h: drop unused `gc_cruft_packs` builtin/gc.c: make `gc.cruftPacks` enabled by default t/t9300-fast-import.sh: prepare for `gc --cruft` by default t/t6500-gc.sh: add additional test cases t/t6500-gc.sh: refactor cruft pack tests t/t6501-freshen-objects.sh: prepare for `gc --cruft` by default t/t5304-prune.sh: prepare for `gc --cruft` by default builtin/gc.c: ignore cruft packs with `--keep-largest-pack` builtin/repack.c: fix incorrect reference to '-C' pack-write.c: plug a leak in stage_tmp_packfiles()	2023-04-28 16:03:03 -07:00
Taylor Blau	029a632c35	repository.h: drop unused `gc_cruft_packs` As of the previous commit, all callers that need to read the value of `gc.cruftPacks` do so outside without using the `repo_settings` struct, making its `gc_cruft_packs` unused. Drop it accordingly. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-18 14:56:48 -07:00
Taylor Blau	dbcf611617	pack-revindex: introduce `pack.readReverseIndex` Since `1615c567b8` (Documentation/config/pack.txt: advertise 'pack.writeReverseIndex', 2021-01-25), we have had the `pack.writeReverseIndex` configuration option, which tells Git whether or not it is allowed to write a ".rev" file when indexing a pack. Introduce a complementary configuration knob, `pack.readReverseIndex` to control whether or not Git will read any ".rev" file(s) that may be available on disk. This option is useful for debugging, as well as disabling the effect of ".rev" files in certain instances. This is useful because of the trade-off[^1] between the time it takes to generate a reverse index (slow from scratch, fast when reading an existing ".rev" file), and the time it takes to access a record (the opposite). For example, even though it is faster to use the on-disk reverse index when computing the on-disk size of a packed object, it is slower to enumerate the same value for all objects. Here are a couple of examples from linux.git. When computing the above for a single object, using the on-disk reverse index is significantly faster: $ git rev-parse HEAD >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" <in' Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in Time (mean ± σ): 302.5 ms ± 12.5 ms [User: 258.7 ms, System: 43.6 ms] Range (min … max): 291.1 ms … 328.1 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in Time (mean ± σ): 3.9 ms ± 0.3 ms [User: 1.6 ms, System: 2.4 ms] Range (min … max): 2.0 ms … 4.4 ms 801 runs Summary 'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in' ran 77.29 ± 7.14 times faster than 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in' , but when instead trying to compute the on-disk object size for all objects in the repository, using the ".rev" file is a disadvantage over creating the reverse index from scratch: $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" --batch-all-objects' Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" --batch-all-objects Time (mean ± σ): 8.258 s ± 0.035 s [User: 7.949 s, System: 0.308 s] Range (min … max): 8.199 s … 8.293 s 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" --batch-all-objects Time (mean ± σ): 16.976 s ± 0.107 s [User: 16.706 s, System: 0.268 s] Range (min … max): 16.839 s … 17.105 s 10 runs Summary 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" --batch-all-objects' ran 2.06 ± 0.02 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" --batch-all-objects' Luckily, the results when running `git cat-file` with `--unordered` are closer together: $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects' Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects Time (mean ± σ): 5.066 s ± 0.105 s [User: 4.792 s, System: 0.274 s] Range (min … max): 4.943 s … 5.220 s 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects Time (mean ± σ): 6.193 s ± 0.069 s [User: 5.937 s, System: 0.255 s] Range (min … max): 6.145 s … 6.356 s 10 runs Summary 'git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects' ran 1.22 ± 0.03 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects' Because the equilibrium point between these two is highly machine- and repository-dependent, allow users to configure whether or not they will read any ".rev" file(s) with this configuration knob. [^1]: Generating a reverse index in memory takes O(N) time (where N is the number of objects in the repository), since we use a radix sort. Reading an entry from an on-disk ".rev" file is slower since each operation is bound by disk I/O instead of memory I/O. In order to compute the on-disk size of a packed object, we need to find the offset of our object, and the adjacent object (the on-disk size difference of these two). Finding the first offset requires a binary search. Finding the latter involves a single .rev lookup. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-13 07:55:46 -07:00
Elijah Newren	f332121e75	treewide: remove unnecessary git-compat-util.h includes in headers For sanity, we should probably do one of the following: (a) make C and header files both depend upon everything they need (b) consistently exclude git-compat-util.h from headers and require it be the first include in C files Currently, we have some of the headers following (a) and others following (b), which makes things messy. In the past I was pushed towards (b), as per [1] and [2]. Further, during this series I discovered that this mixture empirically will mean that we end up with C files that do not directly include git-compat-util.h, and do include headers that don't include git-compat-util.h, with the result that we likely have headers included before an indirect inclusion of git-compat-util.h. Since git-compat-util.h has tricky platform-specific stuff that is meant to be included before everything else, this state of affairs is risky and may lead to things breaking in subtle ways (and only on some platforms) as per [1] and [2]. Since including git-compat-util.h in existing header files makes it harder for us to catch C files that are missing that include, let's switch to (b) to make the enforcement of this rule easier. Remove the inclusion of git-compat-util.h from header files other than the ones that have been approved as alternate first includes. [1] https://lore.kernel.org/git/20180811173406.GA9119@sigill.intra.peff.net/ [2] https://lore.kernel.org/git/20180811174301.GA9287@sigill.intra.peff.net/ Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-23 17:25:28 -08:00
Derrick Stolee	17194b195d	features: feature.manyFiles implies fast index writes The recent addition of the index.skipHash config option allows index writes to speed up by skipping the hash computation for the trailing checksum. This is particularly critical for repositories with many files at HEAD, so add this config option to two cases where users in that scenario may opt-in to such behavior: 1. The feature.manyFiles config option enables some options that are helpful for repositories with many files at HEAD. 2. 'scalar register' and 'scalar reconfigure' set config options that optimize for large repositories. In both of these cases, set index.skipHash=true to gain this speedup. Add tests that demonstrate the proper way that index.skipHash=true can override feature.manyFiles=true. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-01-07 07:46:14 +09:00
Taylor Blau	bdd42e34e3	Merge branch 'es/mark-gc-cruft-as-experimental' Enable gc.cruftpacks by default for those who opt into feature.experimental setting. * es/mark-gc-cruft-as-experimental: config: let feature.experimental imply gc.cruftPacks=true gc: add tests for --cruft and friends	2022-11-08 17:14:48 -05:00
Emily Shaffer	c695592850	config: let feature.experimental imply gc.cruftPacks=true We are interested in exploring whether gc.cruftPacks=true should become the default value. To determine whether it is safe to do so, let's encourage more users to try it out. Users who have set feature.experimental=true have already volunteered to try new and possibly-breaking config changes, so let's try this new default with that set of users. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-26 14:39:31 -07:00
Junio C Hamano	f322e9f51b	Merge branch 'ab/submodule-helper-prep' Code clean-up of "git submodule--helper". * ab/submodule-helper-prep: (33 commits) submodule--helper: fix bad config API usage submodule--helper: libify even more "die" paths for module_update() submodule--helper: libify more "die" paths for module_update() submodule--helper: check repo{_submodule,}_init() return values submodule--helper: libify "must_die_on_failure" code paths (for die) submodule--helper update: don't override 'checkout' exit code submodule--helper: libify "must_die_on_failure" code paths submodule--helper: libify determine_submodule_update_strategy() submodule--helper: don't exit() on failure, return submodule--helper: use "code" in run_update_command() submodule API: don't handle SM_..{UNSPECIFIED,COMMAND} in to_string() submodule--helper: don't call submodule_strategy_to_string() in BUG() submodule--helper: add missing braces to "else" arm submodule--helper: return "ret", not "1" from update_submodule() submodule--helper: rename "int res" to "int ret" submodule--helper: don't redundantly check "else if (res)" submodule--helper: refactor "errmsg_str" to be a "struct strbuf" submodule--helper: add "const" to passed "struct update_data" submodule--helper: add "const" to copy of "update_data" submodule--helper: add "const" to passed "module_clone_data" ...	2022-09-13 11:38:23 -07:00
Ævar Arnfjörð Bjarmason	1e8697b5c4	submodule--helper: check repo{_submodule,}_init() return values Fix code added in `ce125d431a` (submodule: extract path to submodule gitdir func, 2021-09-15) and `a77c3fcb5e` (submodule--helper: get remote names from any repository, 2022-03-04) which failed to check the return values of repo_init() and repo_submodule_init(). If we failed to initialize the repository or submodule we could segfault when trying to access the invalid repository structs. Let's also check that these were the only such logic errors in the codebase by making use of the "warn_unused_result" attribute. This is valid as of GCC 3.4.0 (and clang will catch it via its faking of __GNUC__ ). As the comment being added to git-compat-util.h we're piggy-backing on the LAST_ARG_MUST_BE_NULL version check out of lazyness. See `9fe3edc47f` (Add the LAST_ARG_MUST_BE_NULL macro, 2013-07-18) for its addition. The marginal benefit of covering gcc 3.4.0..4.0.0 is near-zero (or zero) at this point. It mostly matters that we catch this somewhere. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-09-02 09:16:24 -07:00
Taylor Blau	a92d8523ce	commit-graph: pass repo_settings instead of repository The parse_commit_graph() function takes a 'struct repository ' pointer, but it only ever accesses config settings (either directly or through the .settings field of the repo struct). Move all relevant config settings into the repo_settings struct, and update parse_commit_graph() and its existing callers so that it takes 'struct repo_settings ' instead. Callers of parse_commit_graph() will now need to call prepare_repo_settings() themselves, or initialize a 'struct repo_settings' directly. Prior to `ab14d0676c` (commit-graph: pass a 'struct repository *' in more places, 2020-09-09), parsing a commit-graph was a pure function depending only on the contents of the commit-graph itself. Commit `ab14d0676c` introduced a dependency on a `struct repository` pointer, and later commits such as `b66d84756f` (commit-graph: respect 'commitGraph.readChangedPaths', 2020-09-09) added dependencies on config settings, which were accessed through the `settings` field of the repository pointer. This field was initialized via a call to `prepare_repo_settings()`. Additionally, this fixes an issue in fuzz-commit-graph: In `44c7e62` (2021-12-06, repo-settings:prepare_repo_settings only in git repos), prepare_repo_settings was changed to issue a BUG() if it is called by a process whose CWD is not a Git repository. The combination of commits mentioned above broke fuzz-commit-graph, which attempts to parse arbitrary fuzzing-engine-provided bytes as a commit graph file. Prior to this change, parse_commit_graph() called prepare_repo_settings(), but since we run the fuzz tests without a valid repository, we are hitting the BUG() from `44c7e62` for every test case. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-14 15:42:17 -07:00
Junio C Hamano	439c1e6d5d	Merge branch 'jh/builtin-fsmonitor-part2' Built-in fsmonitor (part 2). * jh/builtin-fsmonitor-part2: (30 commits) t7527: test status with untracked-cache and fsmonitor--daemon fsmonitor: force update index after large responses fsmonitor--daemon: use a cookie file to sync with file system fsmonitor--daemon: periodically truncate list of modified files t/perf/p7519: add fsmonitor--daemon test cases t/perf/p7519: speed up test on Windows t/perf/p7519: fix coding style t/helper/test-chmtime: skip directories on Windows t/perf: avoid copying builtin fsmonitor files into test repo t7527: create test for fsmonitor--daemon t/helper/fsmonitor-client: create IPC client to talk to FSMonitor Daemon help: include fsmonitor--daemon feature flag in version info fsmonitor--daemon: implement handle_client callback compat/fsmonitor/fsm-listen-darwin: implement FSEvent listener on MacOS compat/fsmonitor/fsm-listen-darwin: add MacOS header files for FSEvent compat/fsmonitor/fsm-listen-win32: implement FSMonitor backend on Windows fsmonitor--daemon: create token-based changed path cache fsmonitor--daemon: define token-ids fsmonitor--daemon: add pathname classification fsmonitor--daemon: implement 'start' command ...	2022-04-04 10:56:24 -07:00
Jeff Hostetler	1e0ea5c431	fsmonitor: config settings are repository-specific Move fsmonitor config settings to a new and opaque `struct fsmonitor_settings` structure. Add a lazily-loaded pointer to this into `struct repo_settings` Create an `enum fsmonitor_mode` type in `struct fsmonitor_settings` to represent the state of fsmonitor. This lets us represent which, if any, fsmonitor provider (hook or IPC) is enabled. Create `fsm_settings__get_()` getters to lazily look up fsmonitor- related config settings. Get rid of the `core_fsmonitor` global variable. Move the code to lookup the existing `core.fsmonitor` config value into the fsmonitor settings. Create a hook pathname variable in `struct fsmonitor-settings` and only set it when in hook mode. Extend the definition of `core.fsmonitor` to be either a boolean or a hook pathname. When true, the builtin FSMonitor is used. When false or unset, no FSMonitor (neither builtin nor hook) is used. The existing `core_fsmonitor` global variable was used to store the pathname to the fsmonitor hook and* it was used as a boolean to see if fsmonitor was enabled. This dual usage and global visibility leads to confusion when we add the IPC-based provider. So lets hide the details in fsmonitor-settings.c and let it decide which provider to use in the case of multiple settings. This avoids cluttering up repo-settings.c with these private details. A future commit in builtin-fsmonitor series will add the ability to disqualify worktrees for various reasons, such as being mounted from a remote volume, where fsmonitor should not be started. Having the config settings hidden in fsmonitor-settings.c allows such worktree restrictions to override the config values used. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-25 16:04:15 -07:00
Ævar Arnfjörð Bjarmason	759f340738	repository.c: free the "path cache" in repo_clear() The "struct path_cache" added in `102de880d2` (path.c: migrate global git_path_* to take a repository argument, 2018-05-17) is only used directly by code in repository.[ch] (but populated in path.[ch]). Let's move this code to repository.[ch], and stop leaking this memory when we run repo_clear(). To avoid the cast change it from a "const char " to a "char ". This also removes the "PATH_CACHE_INIT" macro, which has never been used for anything. For the "struct repository" we already make a hard assumption that it (and "the_repository") can be identically initialized by making it a "static" variable, so making use of a "PATH_CACHE_INIT" somewhere would have been confusing. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-04 13:24:19 -08:00
Junio C Hamano	70ff41ffcf	Merge branch 'en/fetch-negotiation-default-fix' Interaction between fetch.negotiationAlgorithm and feature.experimental configuration variables has been corrected. * en/fetch-negotiation-default-fix: repo-settings: rename the traditional default fetch.negotiationAlgorithm repo-settings: fix error handling for unknown values repo-settings: fix checking for fetch.negotiationAlgorithm=default	2022-02-16 15:14:30 -08:00
Elijah Newren	714edc620c	repo-settings: rename the traditional default fetch.negotiationAlgorithm Give the traditional default fetch.negotiationAlgorithm the name 'consecutive'. Also allow a choice of 'default' to have Git decide between the choices (currently, picking 'skipping' if feature.experimental is true and 'consecutive' otherwise). Update the documentation accordingly. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-02 09:36:17 -08:00
Junio C Hamano	0dc90d954d	Merge branch 'ns/tmp-objdir' New interface into the tmp-objdir API to help in-core use of the quarantine feature. * ns/tmp-objdir: tmp-objdir: disable ref updates when replacing the primary odb tmp-objdir: new API for creating temporary writable databases	2022-01-03 16:24:15 -08:00
Neeraj Singh	ecd81dfc79	tmp-objdir: disable ref updates when replacing the primary odb When creating a subprocess with a temporary ODB, we set the GIT_QUARANTINE_ENVIRONMENT env var to tell child Git processes not to update refs, since the tmp-objdir may go away. Introduce a similar mechanism for in-process temporary ODBs when we call tmp_objdir_replace_primary_odb. Now both mechanisms set the disable_ref_updates flag on the odb, which is queried by the ref_transaction_prepare function. Peff's test case [1] was invoking ref updates via the cachetextconv setting. That particular code silently does nothing when a ref update is forbidden. See the call to notes_cache_put in fill_textconv where errors are ignored. [1] https://lore.kernel.org/git/YVOn3hDsb5pnxR53@coredump.intra.peff.net/ Reported-by: Jeff King <peff@peff.net> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-08 14:06:46 -08:00
Glen Choo	fd3cb0501e	remote: move static variables into per-repository struct remote.c does not works with non-the_repository because it stores its state as static variables. To support non-the_repository, we can use a per-repository struct for the remotes subsystem. Prepare for this change by defining a struct remote_state that holds the remotes subsystem state and move the static variables of remote.c into the_repository->remote_state. This introduces no behavioral or API changes. Signed-off-by: Glen Choo <chooglen@google.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-18 22:31:19 -08:00
Junio C Hamano	d8d33378ed	Merge branch 'ab/repo-settings-cleanup' Code cleanup. * ab/repo-settings-cleanup: repository.h: don't use a mix of int and bitfields repo-settings.c: simplify the setup read-cache & fetch-negotiator: check "enum" values in switch() environment.c: remove test-specific "ignore_untracked..." variable wrapper.c: add x{un,}setenv(), and use xsetenv() in environment.c	2021-10-06 13:40:11 -07:00
Ævar Arnfjörð Bjarmason	c21919f1b2	repository.h: don't use a mix of int and bitfields Change the bitfield added in `58300f4743` (sparse-index: add index.sparse config option, 2021-03-30) and `3964fc2aae` (sparse-index: add guard to ensure full index, 2021-03-30) to just use an "int" boolean instead. It might be smart to optimize the space here in the future, but by consistently using an "int" we can take its address and pass it to repo_cfg_bool(), and therefore don't need to handle "sparse_index" as a special-case when reading the "index.sparse" setting. There's no corresponding config for "command_requires_full_index", but let's change it too for consistency and to prevent future bugs creeping in due to one of these being "unsigned". Using "int" consistently also prevents subtle bugs or undesired control flow creeping in here. Before the preceding commit the initialization of "command_requires_full_index" in prepare_repo_settings() did nothing, i.e. this: r->settings.command_requires_full_index = 1 Was redundant to the earlier memset() to -1. Likewise for "sparse_index" added in `58300f4743` (sparse-index: add index.sparse config option, 2021-03-30) the code and comment added there was misleading, we weren't initializing it to off, but re-initializing it from "1" to "0", and then finally checking the config, and perhaps setting it to "1" again. I.e. we could have applied this patch before the preceding commit: + assert(r->settings.command_requires_full_index == 1); r->settings.command_requires_full_index = 1; /* * Initialize this as off. */ + assert(r->settings.sparse_index == 1); r->settings.sparse_index = 0; if (!repo_config_get_bool(r, "index.sparse", &value) && value) r->settings.sparse_index = 1; Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-22 13:15:00 -07:00
Ævar Arnfjörð Bjarmason	3050b6dfc7	repo-settings.c: simplify the setup Simplify the setup code in repo-settings.c in various ways, making the code shorter, easier to read, and requiring fewer hacks to do the same thing as it did before: Since `7211b9e753` (repo-settings: consolidate some config settings, 2019-08-13) we have memset() the whole "settings" structure to -1 in prepare_repo_settings(), and subsequently relied on the -1 value. Most of the fields did not need to be initialized to -1, and because we were doing that we had the enum labels "UNTRACKED_CACHE_UNSET" and "FETCH_NEGOTIATION_UNSET" purely to reflect the resulting state created this memset() in prepare_repo_settings(). No other code used or relied on them, more on that below. For the rest most of the subsequent "are we -1, then read xyz" can simply be removed by re-arranging what we read first. E.g. when setting the "index.version" setting we should have first read "feature.experimental", so that it (and "feature.manyfiles") can provide a default for our "index.version". Instead the code setting it, added when "feature.manyFiles"[1] was created, was using the UPDATE_DEFAULT_BOOL() macro added in an earlier commit[2]. That macro is now gone, since it was only needed for this pattern of reading things in the wrong order. This also fixes an (admittedly obscure) logic error where we'd conflate an explicit "-1" value in the config with our own earlier memset() -1. We can also remove the UPDATE_DEFAULT_BOOL() wrapper added in [3]. Using it is redundant to simply using the return value from repo_config_get_bool(), which is non-zero if the provided key exists in the config. Details on edge cases relating to the memset() to -1, continued from "more on that below" above: * UNTRACKED_CACHE_KEEP: In [4] the "unset" and "keep" handling for core.untrackedCache was consolidated. But it while we understand the "keep" value, we don't handle it differently than the case of any other unknown value. So let's retain UNTRACKED_CACHE_KEEP and remove the UNTRACKED_CACHE_UNSET setting (which was always implicitly UNTRACKED_CACHE_KEEP before). We don't need to inform any code after prepare_repo_settings() that the setting was "unset", as far as anyone else is concerned it's core.untrackedCache=keep. if "core.untrackedcache" isn't present in the config. * FETCH_NEGOTIATION_UNSET & FETCH_NEGOTIATION_NONE: Since these two two enum fields added in [5] don't rely on the memzero() setting them to "-1" anymore we don't have to provide them with explicit values. 1. `c6cc4c5afd` (repo-settings: create feature.manyFiles setting, 2019-08-13) 2. `31b1de6a09` (commit-graph: turn on commit-graph by default, 2019-08-13) 3. `31b1de6a09` (commit-graph: turn on commit-graph by default, 2019-08-13) 4. `ad0fb65999` (repo-settings: parse core.untrackedCache, 2019-08-13) 5. `aaf633c2ad` (repo-settings: create feature.experimental setting, 2019-08-13) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-22 13:15:00 -07:00
Jonathan Tan	8eb8dcf946	repository: support unabsorbed in repo_submodule_init In preparation for a subsequent commit that migrates code using add_submodule_odb() to repo_submodule_init(), teach repo_submodule_init() to support submodules with unabsorbed gitdirs. (See the documentation for "git submodule absorbgitdirs" for more information about absorbed and unabsorbed gitdirs.) Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-09 14:09:30 -07:00
Jonathan Tan	ef7dc2e9cc	promisor-remote: support per-repository config Instead of using global variables to store promisor remote information, store this config in struct repository instead, and add repository-agnostic non-static functions corresponding to the existing non-static functions that only work on the_repository. The actual lazy-fetching of missing objects currently does not work on repositories other than the_repository, and will still not work after this commit, so add a BUG message explaining this. A subsequent commit will remove this limitation. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-06-28 09:57:42 -07:00
Jonathan Tan	ebaf3bcf1a	repository: move global r_f_p_c to repo struct Move repository_format_partial_clone, which is currently a global variable, into struct repository. (Full support for per-repository partial clone config will be done in a subsequent commit - this is split into its own commit because of the extent of the changes needed.) The new repo-specific variable cannot be set in check_repository_format_gently() (as is currently), because that function does not know which repo it is operating on (or even whether the value is important); therefore this responsibility is delegated to the outermost caller that knows. Of all the outermost callers that know (found by looking at all functions that call clear_repository_format()), I looked at those that either read from the main Git directory or write into a struct repository. These callers have been modified accordingly (write to the_repository in the former case and write to the given struct repository in the latter case). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-06-28 09:57:41 -07:00
Derrick Stolee	58300f4743	sparse-index: add index.sparse config option When enabled, this config option signals that index writes should attempt to use sparse-directory entries. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-30 12:57:47 -07:00
Derrick Stolee	3964fc2aae	sparse-index: add guard to ensure full index Upcoming changes will introduce modifications to the index format that allow sparse directories. It will be useful to have a mechanism for converting those sparse index files into full indexes by walking the tree at those sparse directories. Name this method ensure_full_index() as it will guarantee that the index is fully expanded. This method is not implemented yet, and instead we focus on the scaffolding to declare it and call it at the appropriate time. Add a 'command_requires_full_index' member to struct repo_settings. This will be an indicator that we need the index in full mode to do certain index operations. This starts as being true for every command, then we will set it to false as some commands integrate with sparse indexes. If 'command_requires_full_index' is true, then we will immediately expand a sparse index to a full one upon reading from disk. This suffices for now, but we will want to add more callers to ensure_full_index() later. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-30 12:57:45 -07:00
Junio C Hamano	52b8c8c716	Merge branch 'ds/maintenance-part-2' "git maintenance", an extended big brother of "git gc", continues to evolve. * ds/maintenance-part-2: maintenance: add incremental-repack auto condition maintenance: auto-size incremental-repack batch maintenance: add incremental-repack task midx: use start_delayed_progress() midx: enable core.multiPackIndex by default maintenance: create auto condition for loose-objects maintenance: add loose-objects task maintenance: add prefetch task	2020-10-27 15:09:47 -07:00
Junio C Hamano	288ed98bf7	Merge branch 'tb/bloom-improvements' "git commit-graph write" learned to limit the number of bloom filters that are computed from scratch with the --max-new-filters option. * tb/bloom-improvements: commit-graph: introduce 'commitGraph.maxNewFilters' builtin/commit-graph.c: introduce '--max-new-filters=<n>' commit-graph: rename 'split_commit_graph_opts' bloom: encode out-of-bounds filters as non-empty bloom/diff: properly short-circuit on max_changes bloom: use provided 'struct bloom_filter_settings' bloom: split 'get_bloom_filter()' in two commit-graph.c: store maximum changed paths commit-graph: respect 'commitGraph.readChangedPaths' t/helper/test-read-graph.c: prepare repo settings commit-graph: pass a 'struct repository *' in more places t4216: use an '&&'-chain commit-graph: introduce 'get_bloom_filter_settings()'	2020-09-29 14:01:20 -07:00
Derrick Stolee	18e449f86b	midx: enable core.multiPackIndex by default The core.multiPackIndex setting has been around since `c4d25228eb` (config: create core.multiPackIndex setting, 2018-07-12), but has been disabled by default. If a user wishes to use the multi-pack-index feature, then they must enable this config and run 'git multi-pack-index write'. The multi-pack-index feature is relatively stable now, so make the config option true by default. For users that do not use a multi-pack-index, the only extra cost will be a file lookup to see if a multi-pack-index file exists (once per process, per object directory). Also, this config option will be referenced by an upcoming "incremental-repack" task in the maintenance builtin, so move the config option into the repository settings struct. Note that if GIT_TEST_MULTI_PACK_INDEX=1, then we want to ignore the config option and treat core.multiPackIndex as enabled. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:53:04 -07:00
Taylor Blau	b66d84756f	commit-graph: respect 'commitGraph.readChangedPaths' Git uses the 'core.commitGraph' configuration value to control whether or not the commit graph is used when parsing commits or performing a traversal. Now that commit-graphs can also contain a section for changed-path Bloom filters, administrators that already have commit-graphs may find it convenient to use those graphs without relying on their changed-path Bloom filters. This can happen, for example, during a staged roll-out, or in the event of an incident. Introduce 'commitGraph.readChangedPaths' to control whether or not Bloom filters are read. Note that this configuration is independent from both: - 'core.commitGraph', to allow flexibility in using all parts of a commit-graph _except_ for its Bloom filters. - The '--changed-paths' option for 'git commit-graph write', to allow reading and writing Bloom filters to be controlled independently. When the variable is set, pretend as if no Bloom data was specified at all. This avoids adding additional special-casing outside of the commit-graph internals. Suggested-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-09 12:51:48 -07:00
Jonathan Tan	cbe566a071	negotiator/noop: add noop fetch negotiator Add a noop fetch negotiator. This is introduced to allow partial clones to skip the unneeded negotiation step when fetching missing objects using a "git fetch" subprocess. (The implementation of spawning a "git fetch" subprocess will be done in a subsequent patch.) But this can also be useful for end users, e.g. as a blunt fix for object corruption. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-18 13:25:05 -07:00
Xin Li	16af5f1abb	repository: add a helper function to perform repository format upgrade In version 1 of repository format, "extensions" gained special meaning and it is safer to avoid upgrading when there are pre-existing extensions. Make list-objects-filter to use the helper function instead of setting repository version directly as a prerequisite of exposing the upgrade capability. Signed-off-by: Xin Li <delphij@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-05 10:13:30 -07:00
Junio C Hamano	95ca48973d	Merge branch 'jc/missing-ref-store-fix' We've left the command line parsing of "git log :/a/b/" broken for about a full year without anybody noticing, which has been corrected. * jc/missing-ref-store-fix: repository: mark the "refs" pointer as private sha1-name: do not assume that the ref store is initialized	2020-04-22 13:42:55 -07:00
Jeff King	0220461071	repository: mark the "refs" pointer as private The "refs" pointer in a struct repository starts life as NULL, but then is lazily initialized when it is accessed via get_main_ref_store(). However, it's easy for calling code to forget this and access it directly, leading to code which works _some_ of the time, but fails if it is called before anybody else accesses the refs. This was the cause of the bug fixed by `5ff4b920eb` (sha1-name: do not assume that the ref store is initialized, 2020-04-09). In order to prevent similar bugs, let's more clearly mark the "refs" field as private. In addition to helping future code, the name change will help us audit any existing direct uses. Besides get_main_ref_store() itself, it turns out there is only one. But we know it's OK as it is on the line directly after the fix from `5ff4b920eb`, which will have initialized the pointer. However it's still a good idea for it to model the proper use of the accessing function, so we'll convert it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-09 22:40:48 -07:00
Elijah Newren	15beaaa3d1	Fix spelling errors in code comments Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-10 16:00:54 +09:00
Junio C Hamano	098e8c6716	Merge branch 'jk/disable-commit-graph-during-upload-pack' The "upload-pack" (the counterpart of "git fetch") needs to disable commit-graph when responding to a shallow clone/fetch request, but the way this was done made Git panic, which has been corrected. * jk/disable-commit-graph-during-upload-pack: upload-pack: disable commit graph more gently for shallow traversal commit-graph: bump DIE_ON_LOAD check to actual load-time	2019-10-07 11:32:55 +09:00
Jeff King	6abada1880	upload-pack: disable commit graph more gently for shallow traversal When the client has asked for certain shallow options like "deepen-since", we do a custom rev-list walk that pretends to be shallow. Before doing so, we have to disable the commit-graph, since it is not compatible with the shallow view of the repository. That's handled by `829a321569` (commit-graph: close_commit_graph before shallow walk, 2018-08-20). That commit literally closes and frees our repo->objects->commit_graph struct. That creates an interesting problem for commits that have _already_ been parsed using the commit graph. Their commit->object.parsed flag is set, their commit->graph_pos is set, but their commit->maybe_tree may still be NULL. When somebody later calls repo_get_commit_tree(), we see that we haven't loaded the tree oid yet and try to get it from the commit graph. But since it has been freed, we segfault! So the root of the issue is a data dependency between the commit's lazy-load of the tree oid and the fact that the commit graph can go away mid-process. How can we resolve it? There are a couple of general approaches: 1. The obvious answer is to avoid loading the tree from the graph when we see that it's NULL. But then what do we return for the tree oid? If we return NULL, our caller in do_traverse() will rightly complain that we have no tree. We'd have to fallback to loading the actual commit object and re-parsing it. That requires teaching parse_commit_buffer() to understand re-parsing (i.e., not starting from a clean slate and not leaking any allocated bits like parent list pointers). 2. When we close the commit graph, walk through the set of in-memory objects and clear any graph_pos pointers. But this means we also have to "unparse" any such commits so that we know they still need to open the commit object to fill in their trees. So it's no less complicated than (1), and is more expensive (since we clear objects we might not later need). 3. Stop freeing the commit-graph struct. Continue to let it be used for lazy-loads of tree oids, but let upload-pack specify that it shouldn't be used for further commit parsing. 4. Push the whole shallow rev-list out to its own sub-process, with the commit-graph disabled from the start, giving it a clean memory space to work from. I've chosen (3) here. Options (1) and (2) would work, but are non-trivial to implement. Option (4) is more expensive, and I'm not sure how complicated it is (shelling out for the actual rev-list part is easy, but we do then parse the resulting commits internally, and I'm not clear which parts need to be handling shallow-ness). The new test in t5500 triggers this segfault, but see the comments there for how horribly intimate it has to be with how both upload-pack and commit graphs work. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-12 12:30:08 -07:00
Derrick Stolee	50f26bd035	fetch: add fetch.writeCommitGraph config setting The commit-graph feature is now on by default, and is being written during 'git gc' by default. Typically, Git only writes a commit-graph when a 'git gc --auto' command passes the gc.auto setting to actualy do work. This means that a commit-graph will typically fall behind the commits that are being used every day. To stay updated with the latest commits, add a step to 'git fetch' to write a commit-graph after fetching new objects. The fetch.writeCommitGraph config setting enables writing a split commit-graph, so on average the cost of writing this file is very small. Occasionally, the commit-graph chain will collapse to a single level, and this could be slow for very large repos. For additional use, adjust the default to be true when feature.experimental is enabled. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-03 12:06:14 -07:00
Derrick Stolee	aaf633c2ad	repo-settings: create feature.experimental setting The 'feature.experimental' setting includes config options that are not committed to become defaults, but could use additional testing. Update the following config settings to take new defaults, and to use the repo_settings struct if not already using them: * 'pack.useSparse=true' * 'fetch.negotiationAlgorithm=skipping' In the case of fetch.negotiationAlgorithm, the existing logic would load the config option only when about to use the setting, so had a die() statement on an unknown string value. This is removed as now the config is parsed under prepare_repo_settings(). In general, this die() is probably misplaced and not valuable. A test was removed that checked this die() statement executed. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-13 13:33:55 -07:00
Derrick Stolee	ad0fb65999	repo-settings: parse core.untrackedCache The core.untrackedCache config setting is slightly complicated, so clarify its use and centralize its parsing into the repo settings. The default value is "keep" (returned as -1), which persists the untracked cache if it exists. If the value is set as "false" (returned as 0), then remove the untracked cache if it exists. If the value is set as "true" (returned as 1), then write the untracked cache and persist it. Instead of relying on magic values of -1, 0, and 1, split these options into an enum. This allows the use of "-1" as a default value. After parsing the config options, if the value is unset we can initialize it to UNTRACKED_CACHE_KEEP. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-13 13:33:55 -07:00

1 2

80 commits