development/git - HydraGit

mirror of https://github.com/git/git synced 2024-10-30 14:03:28 +00:00

Author	SHA1	Message	Date
Calvin Wan	91c080dff5	git-compat-util: move alloc macros to git-compat-util.h alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for dynamic array allocation. Moving these macros to git-compat-util.h with the other alloc macros focuses alloc.[ch] to allocation for Git objects and additionally allows us to remove inclusions to alloc.h from files that solely used the above macros. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-07-05 11:42:31 -07:00
Elijah Newren	68d686460f	fsmonitor-ll.h: split this header out of fsmonitor.h This creates a new fsmonitor-ll.h with most of the functions from fsmonitor.h, though it leaves three inline functions where they were. Two-thirds of the files that previously included fsmonitor.h did not need those three inline functions or the six extra includes those inline functions required, so this allows them to only include the lower level header. Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-21 13:39:54 -07:00
Elijah Newren	bc5c5ec044	cache.h: remove this no-longer-used header Since this header showed up in some places besides just #include statements, update/clean-up/remove those other places as well. Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got away with violating the rule that all files must start with an include of git-compat-util.h (or a short-list of alternate headers that happen to include it first). This change exposed the violation and caused it to stop building correctly; fix it by having it include git-compat-util.h first, as per policy. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-21 13:39:53 -07:00
Elijah Newren	f5653856c2	name-hash.h: move declarations for name-hash.c from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-06-21 13:39:53 -07:00
Elijah Newren	32a8f51061	environment.h: move declarations for environment.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:53 -07:00
Elijah Newren	f394e093df	treewide: be explicit about dependence on gettext.h Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:51 -07:00
Elijah Newren	36bf195890	alloc.h: move ALLOC_GROW() functions from cache.h This allows us to replace includes of cache.h with includes of the much smaller alloc.h in many places. It does mean that we also need to add includes of alloc.h in a number of C files. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-23 17:25:28 -08:00
Ævar Arnfjörð Bjarmason	6269f8eaad	treewide: always have a valid "index_state.repo" member When the "repo" member was added to "the_index" in [1] the repo_read_index() was made to populate it, but the unpopulated "the_index" variable didn't get the same treatment. Let's do that in initialize_the_repository() when we set it up, and likewise for all of the current callers initialized an empty "struct index_state". This simplifies code that needs to deal with "the_index" or a custom "struct index_state", we no longer need to second-guess this part of the "index_state" deep in the stack. A recent example of such second-guessing is the "istate->repo ? istate->repo : the_repository" code in [2]. We can now simply use "istate->repo". We're doing this by making use of the INDEX_STATE_INIT() macro (and corresponding function) added in [3], which now have mandatory "repo" arguments. Because we now call index_state_init() in repository.c's initialize_the_repository() we don't need to handle the case where we have a "repo->index" whose "repo" member doesn't match the "repo" we're setting up, i.e. the "Complete the double-reference" code in repo_read_index() being altered here. That logic was originally added in [1], and was working around the lack of what we now have in initialize_the_repository(). For "fsmonitor-settings.c" we can remove the initialization of a NULL "r" argument to "the_repository". This was added back in [4], and was needed at the time for callers that would pass us the "r" from an "istate->repo". Before this change such a change to "fsmonitor-settings.c" would segfault all over the test suite (e.g. in t0002-gitfile.sh). This change has wider eventual implications for "fsmonitor-settings.c". The reason the other lazy loading behavior in it is required (starting with "if (!r->settings.fsmonitor) ..." is because of the previously passed "r" being "NULL". I have other local changes on top of this which move its configuration reading to "prepare_repo_settings()" in "repo-settings.c", as we could now start to rely on it being called for our "r". But let's leave all of that for now, and narrowly remove this particular part of the lazy-loading. 1. `1fd9ae517c` (repository: add repo reference to index_state, 2021-01-23) 2. `ee1f0c242e` (read-cache: add index.skipHash config option, 2023-01-06) 3. `2f6b1eb794` (cache API: add a "INDEX_STATE_INIT" macro/function, add release_index(), 2023-01-12) 4. `1e0ea5c431` (fsmonitor: config settings are repository-specific, 2022-03-25) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-01-17 14:32:06 -08:00
Ævar Arnfjörð Bjarmason	29fefafcba	sparse-index API: BUG() out on NULL ensure_full_index() Make the ensure_full_index() function stricter, and have it only accept a non-NULL "struct index_state". This function (and this behavior) was added in [1]. The only reason it needed to be this lax was due to interaction with repo_index_has_changes(). See the addition of that code in [2]. The other reason for why this was needed dates back to interaction with code added in [3]. In [4] we started calling ensure_full_index() in unpack_trees(), but the caller added in `34110cd4e3` wants to pass us a NULL "dst_index". Let's instead do the NULL check in unpack_trees() itself. 1. `4300f8442a` (sparse-index: implement ensure_full_index(), 2021-03-30) 2. `0c18c059a1` (read-cache: ensure full index, 2021-04-01) 3. `34110cd4e3` (Make 'unpack_trees()' have a separate source and destination index, 2008-03-06) 4. `6863df3550` (unpack-trees: ensure full index, 2021-03-30) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-01-13 10:36:57 -08:00
Ævar Arnfjörð Bjarmason	d2cdf2c285	sparse-index.c: expand_to_path() can assume non-NULL "istate" This function added in [1] was subsequently used in [2]. All of the calls to it are in name-hash.c, and come after calls to lazy_init_name_hash(istate). The first thing that function does is: if (istate->name_hash_initialized) return; So we can already assume that we have a non-NULL "istate" here, or we'd be segfaulting. Let's not confuse matters by making it appear that's not the case. 1. `71f82d032f` (sparse-index: expand_to_path(), 2021-04-12) 2. `4589bca829` (name-hash: use expand_to_path(), 2021-04-12) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-01-13 10:36:57 -08:00
Anh Le	8c7abdc596	index: raise a bug if the index is materialised more than once If clear_skip_worktree_from_present_files() encounter a sparse directory, it fully materialise the index which should expand any sparse directories and start going through each entries again. If this happens more than once, raise it with a BUG. Signed-off-by: Anh Le <anh@canva.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-11-04 20:28:28 -04:00
Anh Le	89aaab11a3	index: add trace2 region for clear skip worktree When using sparse checkout, clear_skip_worktree_from_present_files() must enumerate index entries to find ones with the SKIP_WORKTREE bit to determine whether those index entries exist on disk (in which case their SKIP_WORKTREE bit should be removed). In a large repository, this may take considerable time depending on the size of the index. Add a trace2 region to surface this information, keeping a count of how many paths have been checked. Separately, keep counts after a full index is materialized. Signed-off-by: Anh Le <anh@canva.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-11-04 20:28:28 -04:00
Junio C Hamano	c276c21da6	Merge branch 'ds/sparse-sparse-checkout' "sparse-checkout" learns to work well with the sparse-index feature. * ds/sparse-sparse-checkout: sparse-checkout: integrate with sparse index p2000: add test for 'git sparse-checkout [add\|set]' sparse-index: complete partial expansion sparse-index: partially expand directories sparse-checkout: --no-sparse-index needs a full index cache-tree: implement cache_tree_find_path() sparse-index: introduce partially-sparse indexes sparse-index: create expand_index() t1092: stress test 'git sparse-checkout set' t1092: refactor 'sparse-index contents' test	2022-06-03 14:30:35 -07:00
Derrick Stolee	ac8acb4f2c	sparse-index: complete partial expansion To complete the implementation of expand_to_pattern_list(), we need to detect when a sparse directory entry should remain sparse. This avoids a full expansion, so we now need to use the PARTIALLY_SPARSE mode to indicate this state. There still are no callers to this method, but we will add one in the next change. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-23 11:08:21 -07:00
Derrick Stolee	0243930af4	sparse-index: partially expand directories The expand_to_pattern_list() method expands sparse directory entries to their list of contained files when either the pattern list is NULL or the directory is contained in the new pattern list's cone mode patterns. It is possible that the pattern list has a recursive match with a directory 'A/B/C/' and so an existing sparse directory 'A/B/' would need to be expanded. If there exists a directory 'A/B/D/', then that directory should not be expanded and instead we can create a sparse directory. To implement this, we plug into the add_path_to_index() callback for the call to read_tree_at(). Since we now need access to both the index we are writing and the pattern list we are comparing, create a 'struct modify_index_context' to use as a data transfer object. It is important that we use the given pattern list since we will use this pattern list to change the sparse-checkout patterns and cannot use istate->sparse_checkout_patterns. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-23 11:08:21 -07:00
Derrick Stolee	9fadb373dd	sparse-index: introduce partially-sparse indexes A future change will present a temporary, in-memory mode where the index can both contain sparse directory entries but also not be completely collapsed to the smallest possible sparse directories. This will be necessary for modifying the sparse-checkout definition while using a sparse index. For now, convert the single-bit member 'sparse_index' in 'struct index_state' to be a an 'enum sparse_index_mode' with three modes: * INDEX_EXPANDED (0): No sparse directories exist. This is always the case for repositories that do not use cone-mode sparse-checkout. * INDEX_COLLAPSED: Sparse directories may exist. Files outside the sparse-checkout cone are reduced to sparse directory entries whenever possible. * INDEX_PARTIALLY_SPARSE: Sparse directories may exist. Some file entries outside the sparse-checkout cone may exist. Running convert_to_sparse() may further reduce those files to sparse directory entries. The main reason to store this extra information is to allow convert_to_sparse() to short-circuit when the index is already in INDEX_EXPANDED mode but to actually do the necessary work when in INDEX_PARTIALLY_SPARSE mode. The INDEX_PARTIALLY_SPARSE mode will be used in an upcoming change. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-23 11:08:21 -07:00
Derrick Stolee	dce241b020	sparse-index: create expand_index() This is the first change in a series to allow modifying the sparse-checkout pattern set without expanding a sparse index to a full one in the process. Here, we focus on the problem of expanding the pattern set through a command like 'git sparse-checkout add <path>' which needs to create new index entries for the paths now being written to the worktree. To achieve this, we need to be able to replace sparse directory entries with their contained files and subdirectories. Once this is complete, other code paths can discover those cache entries and write the corresponding files to disk before committing the index. We already have logic in ensure_full_index() that expands the index entries, so we will use that as our base. Create a new method, expand_index(), which takes a pattern list, but for now mostly ignores it. The current implementation is only correct when the pattern list is NULL as that does the same as ensure_full_index(). In fact, ensure_full_index() is converted to a shim over expand_index(). A future update will actually implement expand_index() to its full capabilities. For now, it is created and documented. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-23 11:08:21 -07:00
Victoria Dye	cfde4cd6ff	sparse-index: expose 'is_sparse_index_allowed()' Expose 'is_sparse_index_allowed()' publicly so that it may be used by callers outside of 'sparse-index.c'. While no such callers exist yet, it will be used in a subsequent commit. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-10 16:45:12 -07:00
Junio C Hamano	82386b4496	Merge branch 'en/present-despite-skipped' In sparse-checkouts, files mis-marked as missing from the working tree could lead to later problems. Such files were hard to discover, and harder to correct. Automatically detecting and correcting the marking of such files has been added to avoid these problems. * en/present-despite-skipped: repo_read_index: add config to expect files outside sparse patterns Accelerate clear_skip_worktree_from_present_files() by caching Update documentation related to sparsity and the skip-worktree bit repo_read_index: clear SKIP_WORKTREE bit from files present in worktree unpack-trees: fix accidental loss of user changes t1011: add testcase demonstrating accidental loss of user modifications	2022-03-09 13:38:23 -08:00
Elijah Newren	ecc7c8841d	repo_read_index: add config to expect files outside sparse patterns Typically with sparse checkouts, we expect files outside the sparsity patterns to be marked as SKIP_WORKTREE and be missing from the working tree. Sometimes this expectation would be violated however; including in cases such as: * users grabbing files from elsewhere and writing them to the worktree (perhaps by editing a cached copy in an editor, copying/renaming, or even untarring) * various git commands having incomplete or no support for the SKIP_WORKTREE bit[1,2] * users attempting to "abort" a sparse-checkout operation with a not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the working tree is not atomic)[3]. When the SKIP_WORKTREE bit in the index did not reflect the presence of the file in the working tree, it traditionally caused confusion and was difficult to detect and recover from. So, in a sparse checkout, since `af6a51875a` (repo_read_index: clear SKIP_WORKTREE bit from files present in worktree, 2022-01-14), Git automatically clears the SKIP_WORKTREE bit at index read time for entries corresponding to files that are present in the working tree. There is another workflow, however, where it is expected that paths outside the sparsity patterns appear to exist in the working tree and that they do not lose the SKIP_WORKTREE bit, at least until they get modified. A Git-aware virtual file system[4] takes advantage of its position as a file system driver to expose all files in the working tree, fetch them on demand using partial clone on access, and tell Git to pay attention to them on demand by updating the sparse checkout pattern on writes. This means that commands like "git status" only have to examine files that have potentially been modified, whereas commands like "ls" are able to show the entire codebase without requiring manual updates to the sparse checkout pattern. Thus since `af6a51875a`, Git with such Git-aware virtual file systems unsets the SKIP_WORKTREE bit for all files and commands like "git status" have to fetch and examine them all. Introduce a configuration setting sparse.expectFilesOutsideOfPatterns to allow limiting the tracked set of files to a small set once again. A Git-aware virtual file system or other application that wants to maintain files outside of the sparse checkout can set this in a repository to instruct Git not to check for the presence of SKIP_WORKTREE files. The setting defaults to false, so most users of sparse checkout will still get the benefit of an automatically updating index to recover from the variety of difficult issues detailed in `af6a51875a` for paths with SKIP_WORKTREE set despite the path being present. [1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ [2] The three long paragraphs in the middle of https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/ [4] such as the vfsd described in https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/ Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-01 23:37:48 -08:00
Junio C Hamano	6249ce2d1b	Merge branch 'ds/sparse-checkout-requires-per-worktree-config' "git sparse-checkout" wants to work with per-worktree configuration, but did not work well in a worktree attached to a bare repository. * ds/sparse-checkout-requires-per-worktree-config: config: make git_configset_get_string_tmp() private worktree: copy sparse-checkout patterns and config on add sparse-checkout: set worktree-config correctly config: add repo_config_set_worktree_gently() worktree: create init_worktree_config() Documentation: add extensions.worktreeConfig details	2022-02-25 15:47:33 -08:00
Derrick Stolee	7316dc5f6f	sparse-checkout: set worktree-config correctly `git sparse-checkout set/init` enables worktree-specific configuration[] by setting extensions.worktreeConfig=true, but neglects to perform the additional necessary bookkeeping of relocating `core.bare=true` and `core.worktree` from $GIT_COMMON_DIR/config to $GIT_COMMON_DIR/config.worktree, as documented in git-worktree.txt. As a result of this oversight, these settings, which are nonsensical for secondary worktrees, can cause Git commands to incorrectly consider a worktree bare (in the case of `core.bare`) or operate on the wrong worktree (in the case of `core.worktree`). Fix this problem by taking advantage of the recently-added init_worktree_config() which enables `extensions.worktreeConfig` and takes care of necessary bookkeeping. While at it, for backward-compatibility reasons, also stop upgrading the repository format to "1" since doing so is (unintentionally) not required to take advantage of `extensions.worktreeConfig`, as explained by `11664196ac` ("Revert "check_repository_format_gently(): refuse extensions for old repositories"", 2020-07-15). [] The main reason to use worktree-specific config for the sparse-checkout builtin was to avoid enabling sparse-checkout patterns in one and causing a loss of files in another. If a worktree does not have a sparse-checkout patterns file, then the sparse-checkout logic will not kick in on that worktree. Reported-by: Sean Allred <allred.sean@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-08 09:49:20 -08:00
Johannes Schindelin	ae103c37d3	sparse-index: sparse index is disallowed when split index is active In `6e773527b6` (sparse-index: convert from full to sparse, 2021-03-30), we introduced initial support for a sparse index, and were careful to avoid converting to a sparse index in the presence of a split index. However, when we _just_ read a freshly-initialized index, it might not contain a split index even if _writing_ it will add one by virtue of being asked for via the `GIT_TEST_SPLIT_INDEX` variable. We did not notice any problems with checking _only_ for `split_index` (and not `GIT_TEST_SPLIT_INDEX`) right until both `vd/sparse-sparsity-fix-on-read` _and_ `vd/sparse-reset` were merged. Those two topics' interplay triggers a bug in conjunction with running t1091.15 when `GIT_TEST_SPLIT_INDEX=true` in the following way: `vd/sparse-sparsity-fix-on-read` ensures that the index is made sparse right after reading, and `vd/sparse-reset` ensures that the index is made non-sparse again unless running in the `--soft` mode. Since the split index feature is incompatible with the sparse index feature, we see a symptom like this: fatal: position for replacement 4 exceeds base index size 4 Let's fix this by avoiding the conversion to a sparse index when `GIT_TEST_SPLIT_INDEX=true`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-23 17:06:05 -08:00
Elijah Newren	d79d299352	Accelerate clear_skip_worktree_from_present_files() by caching Trying to clear the skip-worktree bit from files that are present does present some computational overhead, for sparse-checkouts. (We do not do the bit clearing in non-sparse-checkouts.) Optimize it as follows: Rather than lstat()'ing every SKIP_WORKTREE path, take advantage of the fact that entire directories will often be missing, especially for cone mode and even more so ever since commit `55dfcf9591` ("sparse-checkout: clear tracked sparse dirs", 2021-09-08). If we have already determined that the parent directory of a file (or other previous ancestor) does not exist, then the file cannot exist either so we do not need to lstat() it separately. Timings for p2000 included below, reformatted to fit in normal commit message line lengths, which compare three things: * Timings before this series * Timings of the unoptimized version of clear_skip_worktree_from_present_files() from a few commits ago * Timings after the optimization in this commit (NOTE: t/perf/ appears to have timing resolution only down to 0.01 s, which presents significant measurement error when timings only differ by 0.01s. I don't trust any such timings below, and yet all the optimized results differ by at most 0.01s.) Test Before Series Unoptimized Optimized ----------------------------------------------------------------------------- git status full-v3 0.15(0.10+0.06) 0.32(0.16+0.17) +113.3% 0.16(0.10+0.07) +6.7% full-v4 0.15(0.11+0.05) 0.32(0.17+0.16) +113.3% 0.16(0.11+0.05) +6.7% sparse-v3 0.04(0.03+0.04) 0.04(0.02+0.05) +0.0% 0.04(0.02+0.05) +0.0% sparse-v4 0.04(0.03+0.04) 0.04(0.02+0.05) +0.0% 0.04(0.03+0.05) +0.0% git add -A full-v3 0.40(0.30+0.07) 0.56(0.36+0.17) +40.0% 0.39(0.30+0.07) -2.5% full-v4 0.37(0.28+0.07) 0.54(0.37+0.16) +45.9% 0.38(0.29+0.07) +2.7% sparse-v3 0.06(0.04+0.05) 0.08(0.05+0.05) +33.3% 0.06(0.05+0.04) +0.0% sparse-v4 0.05(0.03+0.05) 0.05(0.04+0.04) +0.0% 0.06(0.04+0.05) +20.0% git add . full-v3 0.40(0.31+0.07) 0.57(0.37+0.17) +42.5% 0.41(0.30+0.08) +2.5% full-v4 0.38(0.30+0.06) 0.55(0.37+0.16) +44.7% 0.38(0.30+0.06) +0.0% sparse-v3 0.06(0.04+0.05) 0.06(0.05+0.04) +0.0% 0.06(0.03+0.05) +0.0% sparse-v4 0.06(0.05+0.05) 0.06(0.04+0.05) +0.0% 0.06(0.04+0.06) +0.0% git commit -a -m A full-v3 0.41(0.32+0.06) 0.58(0.39+0.17) +41.5% 0.42(0.32+0.07) +2.4% full-v4 0.39(0.30+0.07) 0.56(0.38+0.17) +43.6% 0.40(0.31+0.07) +2.6% sparse-v3 0.04(0.03+0.04) 0.04(0.03+0.04) +0.0% 0.04(0.03+0.04) +0.0% sparse-v4 0.04(0.03+0.05) 0.04(0.03+0.05) +0.0% 0.04(0.03+0.04) +0.0% git checkout -f - full-v3 0.56(0.46+0.07) 0.73(0.55+0.16) +30.4% 0.57(0.47+0.08) +1.8% full-v4 0.54(0.45+0.07) 0.71(0.53+0.17) +31.5% 0.55(0.45+0.07) +1.9% sparse-v3 0.06(0.04+0.04) 0.06(0.04+0.05) +0.0% 0.06(0.04+0.05) +0.0% sparse-v4 0.05(0.05+0.04) 0.05(0.04+0.05) +0.0% 0.06(0.04+0.05) +20.0% git reset full-v3 0.34(0.26+0.05) 0.51(0.34+0.15) +50.0% 0.34(0.26+0.06) +0.0% full-v4 0.32(0.24+0.06) 0.49(0.32+0.15) +53.1% 0.33(0.25+0.06) +3.1% sparse-v3 0.04(0.03+0.04) 0.04(0.03+0.04) +0.0% 0.04(0.03+0.04) +0.0% sparse-v4 0.03(0.03+0.04) 0.03(0.02+0.04) +0.0% 0.03(0.03+0.04) +0.0% git reset --hard full-v3 0.57(0.46+0.07) 0.90(0.61+0.25) +57.9% 0.57(0.45+0.08) +0.0% full-v4 0.54(0.46+0.05) 0.88(0.59+0.26) +63.0% 0.55(0.45+0.07) +1.9% sparse-v3 0.07(0.03+0.03) 0.07(0.04+0.03) +0.0% 0.07(0.03+0.03) +0.0% sparse-v4 0.06(0.03+0.03) 0.06(0.04+0.02) +0.0% 0.06(0.03+0.03) +0.0% git reset -- does-not-exist full-v3 0.35(0.27+0.06) 0.52(0.32+0.17) +48.6% 0.35(0.27+0.06) +0.0% full-v4 0.33(0.26+0.05) 0.50(0.33+0.15) +51.5% 0.33(0.26+0.06) +0.0% sparse-v3 0.04(0.03+0.04) 0.04(0.03+0.04) +0.0% 0.04(0.03+0.04) +0.0% sparse-v4 0.04(0.02+0.04) 0.03(0.02+0.04) -25.0% 0.03(0.02+0.04) -25.0% git diff full-v3 0.07(0.04+0.04) 0.24(0.11+0.14) +242.9% 0.07(0.04+0.04) +0.0% full-v4 0.07(0.03+0.05) 0.24(0.13+0.12) +242.9% 0.08(0.04+0.05) +14.3% sparse-v3 0.02(0.01+0.04) 0.02(0.01+0.04) +0.0% 0.02(0.01+0.05) +0.0% sparse-v4 0.02(0.02+0.03) 0.02(0.01+0.04) +0.0% 0.02(0.01+0.04) +0.0% git diff --cached full-v3 0.05(0.03+0.02) 0.22(0.12+0.09) +340.0% 0.05(0.03+0.01) +0.0% full-v4 0.05(0.03+0.01) 0.23(0.12+0.11) +360.0% 0.05(0.03+0.02) +0.0% sparse-v3 0.01(0.00+0.00) 0.01(0.00+0.00) +0.0% 0.01(0.00+0.00) +0.0% sparse-v4 0.01(0.00+0.00) 0.01(0.00+0.00) +0.0% 0.01(0.00+0.00) +0.0% git blame f2/f4/a full-v3 0.18(0.13+0.05) 0.52(0.29+0.23) +188.9% 0.19(0.15+0.04) +5.6% full-v4 0.19(0.15+0.04) 0.52(0.28+0.23) +173.7% 0.19(0.14+0.04) +0.0% sparse-v3 0.10(0.08+0.02) 0.10(0.09+0.01) +0.0% 0.10(0.09+0.01) +0.0% sparse-v4 0.10(0.08+0.02) 0.10(0.08+0.02) +0.0% 0.10(0.08+0.02) +0.0% git blame f2/f4/f3/a full-v3 0.45(0.36+0.08) 0.78(0.51+0.27) +73.3% 0.45(0.37+0.08) +0.0% full-v4 0.45(0.37+0.08) 0.78(0.51+0.26) +73.3% 0.45(0.37+0.08) +0.0% sparse-v3 0.36(0.32+0.04) 0.36(0.31+0.05) +0.0% 0.36(0.31+0.04) +0.0% sparse-v4 0.36(0.31+0.05) 0.36(0.31+0.05) +0.0% 0.36(0.31+0.04) +0.0% git checkout-index -f --all full-v3 0.07(0.02+0.05) 0.24(0.12+0.12) +242.9% 0.08(0.04+0.04) +14.3% full-v4 0.07(0.03+0.04) 0.24(0.11+0.13) +242.9% 0.08(0.03+0.04) +14.3% sparse-v3 0.04(0.01+0.03) 0.04(0.00+0.03) +0.0% 0.04(0.01+0.03) +0.0% sparse-v4 0.04(0.01+0.02) 0.04(0.01+0.03) +0.0% 0.04(0.01+0.02) +0.0% git update-index --add --remove f2/f4/a full-v3 0.29(0.23+0.02) 0.46(0.30+0.12) +58.6% 0.30(0.24+0.02) +3.4% full-v4 0.27(0.22+0.02) 0.45(0.29+0.12) +66.7% 0.28(0.22+0.03) +3.7% sparse-v3 0.02(0.02+0.00) 0.02(0.01+0.00) +0.0% 0.02(0.01+0.00) +0.0% sparse-v4 0.02(0.02+0.00) 0.02(0.02+0.00) +0.0% 0.02(0.02+0.00) +0.0% So, with the optimization, the extra work appears to be essentially 0 for sparse-checkouts that are also using sparse-indexes (even before my optimization), and the extra work appears to be just marginally more than 0 for sparse-checkouts that are using full indexes. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-14 14:44:07 -08:00
Elijah Newren	af6a51875a	repo_read_index: clear SKIP_WORKTREE bit from files present in worktree The fix is short (~30 lines), but the description is not. Sorry. There is a set of problems caused by files in what I'll refer to as the "present-despite-SKIP_WORKTREE" state. This commit aims to not just fix these problems, but remove the entire class as a possibility -- for those using sparse checkouts. But first, we need to understand the problems this class presents. A quick outline: * Problems * User facing issues * Problem space complexity * Maintenance and code correctness challenges * SKIP_WORKTREE expectations in Git * Suggested solution * Pros/Cons of suggested solution * Notes on testcase modifications === User facing issues === There are various ways for users to get files to be present in the working copy despite having the SKIP_WORKTREE bit set for that file in the index. This may come from: * various git commands not really supporting the SKIP_WORKTREE bit[1,2] * users grabbing files from elsewhere and writing them to the worktree (perhaps even cached in their editor) * users attempting to "abort" a sparse-checkout operation with a not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the working tree is not atomic)[3]. Once users have present-despite-SKIP_WORKTREE files, any modifications users make to these files will be ignored, possibly to users' confusion. Further: * these files will degrade performance for the sparse-index case due to requiring the index to be expanded (see commit `55dfcf9591` ("sparse-checkout: clear tracked sparse dirs", 2021-09-08) for why we try to delete entire directories outside the sparse cone). * these files will not be updated by by standard commands (switch/checkout/pull/merge/rebase will leave them alone unless conflicts happen -- and even then, the conflicted file may be written somewhere else to avoid overwriting the SKIP_WORKTREE file that is present and in the way) * there is nothing in Git that users can use to discover such files (status, diff, grep, etc. all ignore it) * there is no reasonable mechanism to "recover" from such a condition (neither `git sparse-checkout reapply` nor `git reset --hard` will correct it). So, not only are users modifications ignored, but the files get progressively more stale over time. At some point in the future, they may change their sparseness specification or disable sparse-checkouts. At that time, all present-despite-SKIP_WORKTREE files will show up as having lots of modifications because they represent a version from a different branch or commit. These might include user-made local changes from days before, but the only way to tell is to have users look through them all closely. If these users come to others for help, there will be no logs that explain the issue; it's just a mysterious list of changes. Users might adamantly claim (correctly, as it turns out) that they didn't modify these files, while others presume they did. [1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ [2] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/ === Problem space complexity === SKIP_WORKTREE has been part of Git for over a decade. Duy did lots of work on it initially, and several others have since come along and put lots of work into it. Stolee spent most of 2021 on the sparse-index, with lots of bugfixes along the way including to non-sparse-index cases as we are still trying to get sparse checkouts to behave reasonably. Basically every codepath throughout the treat needs to be aware of an additional type of file: tracked-but-not-present. The extra type results in lots of extra testcases and lots of extra code everywhere. But, the sad thing is that we actually have more than one extra type. We have tracked, tracked-but-not-present (SKIP_WORKTREE), and tracked-but-promised-to-not-be-present-but-is-present-anyway (present-despite-SKIP_WORKTREE). Two types is a monumental amount of effort to support, and adding a third feels a bit like insanity[4]. [4] Some examples of which can be seen at https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ === Maintenance and code correctness challenges === Matheus' patches to grep stalled for nearly a year, in part because of complications of how to handle sparse-checkouts appropriately in all cases[5][6] (with trying to sanely figure out how to sanely handle present-despite-SKIP_WORKTREE files being one of the complications). His rm/add follow-ups also took months because of those kinds of issues[7]. The corner cases with things like submodules and SKIP_WORKTREE with the addition of present-despite-SKIP_WORKTREE start becoming really complex[8]. We've had to add ugly logic to merge-ort to attempt to handle present-despite-SKIP_WORKTREE files[9], and basically just been forced to give up in merge-recursive knowing full well that we'll sometimes silently discard user modifications. Despite stash essentially being a merge, it needed extra code (beyond what was in merge-ort and merge-recursive) to manually tweak SKIP_WORKTREE bits in order to avoid a few different bugs that'd result in an early abort with a partial stash application[10]. [5] See https://lore.kernel.org/git/5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br/#t and the dates on the thread; also Matheus and I had several conversations off-list trying to resolve the issues over that time [6] ...it finally kind of got unstuck after https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ [7] See for example https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/#t and quotes like "The core functionality of sparse-checkout has always been only partially implemented", a statement I still believe is true today. [8] https://lore.kernel.org/git/pull.809.git.git.1592356884310.gitgitgadget@gmail.com/ [9] See commit `66b209b86a` ("merge-ort: implement CE_SKIP_WORKTREE handling with conflicted entries", 2021-03-20) [10] See commit `ba359fd507` ("stash: fix stash application in sparse-checkouts", 2020-12-01) === SKIP_WORKTREE expectations in Git === A couple quotes: * From [11] (before the "sparse-checkout" command existed): If it needs too many special cases, hacks, and conditionals, then it is not worth the complexity---if it is easier to write a correct code by allowing Git to populate working tree files, it is perfectly fine to do so. In a sense, the sparse checkout "feature" itself is a hack by itself, and that is why I think this part should be "best effort" as well. * From the git-sparse-checkout manual (still present today): THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE. [11] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ === Suggested solution === SKIP_WORKTREE was written to allow sparse-checkouts, in particular, as the name of the option implies, to allow the file to NOT be in the worktree but consider it to be unchanged rather than deleted. The suggests a simple solution: present-despite-SKIP_WORKTREE files should not exist, for those using sparse-checkouts. Enforce this at index loading time by checking if core.sparseCheckout is true; if so, check files in the index with the SKIP_WORKTREE bit set to verify that they are absent from the working tree. If they are present, unset the bit (in memory, though any commands that write to the index will record the update). Users can, of course, can get the SKIP_WORKTREE bit back such as by running `git sparse-checkout reapply` (if they have ensured the file is unmodified and doesn't match the specified sparsity patterns). === Pros/Cons of suggested solution === Pros: * Solves the user visible problems reported above, which I've been complaining about for nearly a year but couldn't find a solution to. * Helps prevent slow performance degradation with a sparse-index. * Much easier behavior in sparse-checkouts for users to reason about * Very simple, ~30 lines of code. * Significantly simplifies some ugly testcases, and obviates the need to test an entire class of potential issues. * Reduces code complexity, reasoning, and maintenance. Avoids disagreements about weird corner cases[12]. * It has been reported that some users might be (ab)using SKIP_WORKTREE as a let-me-modify-but-keep-the-file-in-the-worktree mechanism[13, and a few other similar references]. These users know of multiple caveats and shortcomings in doing so; perhaps not surprising given the "SKIP_WORKTREE expecations" section above. However, these users use `git update-index --skip-worktree`, and not `git sparse-checkout` or core.sparseCheckout=true. As such, these users would be unaffected by this change and can continue abusing the system as before. [12] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [13] https://stackoverflow.com/questions/13630849/git-difference-between-assume-unchanged-and-skip-worktree Cons: * When core.sparseCheckout is enabled, this adds a performance cost to reading the index. I'll defer discussion of this cost to a subsequent patch, since I have some optimizations to add. === Notes on testcase modifications === The good: * t1011: Compare to two cases above it ('read-tree will not throw away dirty changes, non-sparse'); since the file is present, it should match the non-sparse case now * t1092: sparse-index & sparse-checkout now match full-worktree behavior in more cases! Yaay for consistency! * t6428, t7012: look at how much simpler the tests become! Merge and stash can just fail early telling the user there's a file in the way, instead of not noticing until it's about to write a file and then have to implement sudden crash avoidance. Hurray for sanity! * t7817: sparse behavior better matches full tree behavior. Hurray for sanity! The confusing: * t3705: These changes were ONLY needed on Windows, but they don't hurt other platforms. Let's discuss each individually: * core.sparseCheckout should be false by default. Nothing in this testcase toggles that until many, many tests later. However, early tests (#5 in particular) were testing `update-index --skip-worktree` behavior in a non-sparse-checkout, but the Windows tests in CI were behaving as if core.sparseCheckout=true had been specified somewhere. I do not have access to a Windows machine. But I just manually did what should have been a no-op and turned the config off. And it fixed the test. * I have no idea why the leftover .gitattributes file from this test was causing failures for test #18 on Windows, but only with these changes of mine. Test #18 was checking for empty stderr, and specifically wanted to know that some error completely unrelated to file endings did not appear. The leftover .gitattributes file thus caused some spurious stderr unrelated to the thing being checked. Since other tests did not intend to test normalization, just proactively remove the .gitattributes file. I'm certain this is cleaner and better, I'm just unsure why/how this didn't trigger problems before. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-14 14:43:22 -08:00
Victoria Dye	b93fea08d2	sparse-index: add ensure_correct_sparsity function The `ensure_correct_sparsity` function is intended to provide a means of aligning the in-core index with the sparsity required by the repository settings and other properties of the index. The function first checks whether a sparse index is allowed (per repository & sparse checkout pattern settings). If the sparse index may be used, the index is converted to sparse; otherwise, it is explicitly expanded with `ensure_full_index`. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Victoria Dye <vdye@github.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-24 16:32:38 -08:00
Victoria Dye	13f69f3082	sparse-index: avoid unnecessary cache tree clearing When converting a full index to sparse, clear and recreate the cache tree only if the cache tree is not fully valid. The convert_to_sparse operation should exit silently if a cache tree update cannot be successfully completed (e.g., due to a conflicted entry state). However, because this failure scenario only occurs when at least a portion of the cache tree is invalid, we can save ourselves the cost of clearing and recreating the cache tree by skipping the check when the cache tree is fully valid. Helped-by: Derrick Stolee <dstolee@microsoft.com> Co-authored-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Victoria Dye <vdye@github.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-24 16:32:38 -08:00
Junio C Hamano	dc89c34d9e	Merge branch 'ds/sparse-index-ignored-files' In cone mode, the sparse-index code path learned to remove ignored files (like build artifacts) outside the sparse cone, allowing the entire directory outside the sparse cone to be removed, which is especially useful when the sparse patterns change. * ds/sparse-index-ignored-files: sparse-checkout: clear tracked sparse dirs sparse-index: add SPARSE_INDEX_MEMORY_ONLY flag attr: be careful about sparse directories sparse-checkout: create helper methods sparse-index: use WRITE_TREE_MISSING_OK sparse-index: silently return when cache tree fails unpack-trees: fix nested sparse-dir search sparse-index: silently return when not using cone-mode patterns t7519: rewrite sparse index test	2021-09-20 15:20:44 -07:00
Derrick Stolee	ce7a9f0141	sparse-index: add SPARSE_INDEX_MEMORY_ONLY flag The convert_to_sparse() method checks for the GIT_TEST_SPARSE_INDEX environment variable or the "index.sparse" config setting before converting the index to a sparse one. This is for ease of use since all current consumers are preparing to compress the index before writing it to disk. If these settings are not enabled, then convert_to_sparse() silently returns without doing anything. We will add a consumer in the next change that wants to use the sparse index as an in-memory data structure, regardless of whether the on-disk format should be sparse. To that end, create the SPARSE_INDEX_MEMORY_ONLY flag that will skip these config checks when enabled. All current consumers are modified to pass '0' in the new 'flags' parameter. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-07 22:41:10 -07:00
Derrick Stolee	02155c8c00	sparse-checkout: create helper methods As we integrate the sparse index into more builtins, we occasionally need to check the sparse-checkout patterns to see if a path is within the sparse-checkout cone. Create some helper methods that help initialize the patterns and check for pattern matching to make this easier. The existing callers of commands like get_sparse_checkout_patterns() use a custom 'struct pattern_list' that is not necessarily the one in the 'struct index_state', so there are not many previous uses that could adopt these helpers. There are just two in builtin/add.c and sparse-index.c that can use path_in_sparse_checkout(). We add a path_in_cone_mode_sparse_checkout() as well that will only return false if the path is outside of the sparse-checkout definition _and_ the sparse-checkout patterns are in cone mode. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-07 22:41:10 -07:00
Derrick Stolee	8a96b9d0a7	sparse-index: use WRITE_TREE_MISSING_OK When updating the cache tree in convert_to_sparse(), the WRITE_TREE_MISSING_OK flag indicates that trees might be computed that do not already exist within the object database. This happens in cases such as 'git add' creating new trees that it wants to store in anticipation of a following 'git commit'. If this flag is not specified, then it might trigger a promisor fetch or a failure due to the object not existing locally. Use WRITE_TREE_MISSING_OK during convert_to_sparse() to avoid these possible reasons for the cache_tree_update() to fail. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-07 22:41:09 -07:00
Derrick Stolee	5dc16756b2	sparse-index: silently return when cache tree fails If cache_tree_update() returns a non-zero value, then it could not create the cache tree. This is likely due to a path having a merge conflict. Since we are already returning early, let's return silently to avoid making it seem like we failed to write the index at all. If we remove our dependence on the cache tree within convert_to_sparse(), then we could still recover from this scenario and have a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-07 22:41:09 -07:00
Derrick Stolee	e27eab45c7	sparse-index: silently return when not using cone-mode patterns While the sparse-index is only enabled when core.sparseCheckoutCone is also enabled, it is possible for the user to modify the sparse-checkout file manually in a way that does not match cone-mode patterns. In this case, we should refuse to convert an index into a sparse index, since the sparse_checkout_patterns will not be initialized with recursive and parent path hashsets. Also silently return if there are no cache entries, which is a simple case: there are no paths to make sparse! Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-07 22:41:09 -07:00
Jeff Hostetler	d9e9b44d7a	sparse-index: copy dir_hash in ensure_full_index() Copy the 'index_state->dir_hash' back to the real istate after expanding a sparse index. A crash was observed in 'git status' during some hashmap lookups with corrupted hashmap entries. During an index expansion, new cache-entries are added to the 'index_state->name_hash' and the 'dir_hash' in a temporary 'index_state' variable 'full'. However, only the 'name_hash' hashmap from this temp variable was copied back into the real 'istate' variable. The original copy of the 'dir_hash' was incorrectly preserved. If the table in the 'full->dir_hash' hashmap were realloced, the stale version (in 'istate') would be corrupted. The test suite does not operate on index sizes sufficiently large to trigger this reallocation, so they do not cover this behavior. Increasing the test suite to cover such scale is fragile and likely wasteful. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-08-30 09:24:12 -07:00
Derrick Stolee	f934f1b47f	sparse-index: recompute cache-tree When some commands run with command_requires_full_index=1, then the index can get in a state where the in-memory cache tree is actually equal to the sparse index's cache tree instead of the full one. This results in incorrect entry_count values. By clearing the cache tree before converting to sparse, we avoid this issue. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 15:05:53 -07:00
Derrick Stolee	f8fe49e539	fsmonitor: integrate with sparse index If we need to expand a sparse-index into a full one, then the FS Monitor bitmap is going to be incorrect. Ensure that we start fresh at such an event. While this is currently a performance drawback, the eventual hope of the sparse-index feature is that these expansions will be rare and hence we will be able to keep the FS Monitor data accurate across multiple Git commands. These tests are added to demonstrate that the behavior is the same across a full index and a sparse index, but also that file modifications to a tracked directory outside of the sparse cone will trigger ensure_full_index(). Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 13:42:49 -07:00
Derrick Stolee	47410778fb	sparse-index: include EXTENDED flag when expanding When creating a full index from a sparse one, we create cache entries for every blob within a given sparse directory entry. These are correctly marked with the CE_SKIP_WORKTREE flag, but the CE_EXTENDED flag is not included. The CE_EXTENDED flag would exist if we loaded a full index from disk with these entries marked with CE_SKIP_WORKTREE, so we can add the flag here to be consistent. This allows us to directly compare the flags present in cache entries when testing the sparse-index feature, but has no significance to its correctness in the user-facing functionality. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 13:42:21 -07:00
Derrick Stolee	fc6609d198	sparse-index: skip indexes with unmerged entries The sparse-index format is designed to be compatible with merge conflicts, even those outside the sparse-checkout definition. The reason is that when converting a full index to a sparse one, a cache entry with nonzero stage will not be collapsed into a sparse directory entry. However, this behavior was not tested, and a different behavior within convert_to_sparse() fails in this scenario. Specifically, cache_tree_update() will fail when unmerged entries exist. convert_to_sparse_rec() uses the cache-tree data to recursively walk the tree structure, but also to compute the OIDs used in the sparse-directory entries. Add an index scan to convert_to_sparse() that will detect if these merge conflict entries exist and skip the conversion before trying to update the cache-tree. This is marked as NEEDSWORK because this can be removed with a suitable update to cache_tree_update() or a similar method that can construct a cache-tree with invalid nodes, but still allow creating the nodes necessary for creating sparse directory entries. It is possible that in the future we will not need to make such an update, since if we do not expand a sparse-index into a full one, this conversion does not need to happen. Thus, this can be deferred until the merge machinery is made to integrate with the sparse-index. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 13:42:21 -07:00
Junio C Hamano	107691cb07	Merge branch 'ds/sparse-index-protections' Fix access to uninitialized piece of memory, introduced during this cycle. * ds/sparse-index-protections: sparse-index: fix uninitialized jump	2021-05-21 05:50:38 +09:00
Derrick Stolee	4279cb1c6e	sparse-index: fix uninitialized jump While testing the sparse-index, I verified a test with --valgrind and it complained about an uninitialized value being used in a jump in the path_matches_pattern_list() method. The line was this one: if (*dtype == DT_UNKNOWN) In the call stack, the culprit was the initialization of the dtype variable in convert_to_sparse_rec(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-05-18 06:29:17 +09:00
Ævar Arnfjörð Bjarmason	b79f9c075d	sparse-index.c: remove set_index_sparse_config() Remove the set_index_sparse_config() function by folding it into set_sparse_index_config(), which was its only user. Since `122ba1f7b5` (sparse-checkout: toggle sparse index from builtin, 2021-03-30) the flow of this code hasn't made much sense, we'd get "enabled" in set_sparse_index_config(), proceed to call set_index_sparse_config() with it. There we'd call prepare_repo_settings() and set "repo->settings.sparse_index = 1", only to needlessly call prepare_repo_settings() again in set_sparse_index_config() (where it would early abort), and finally setting "repo->settings.sparse_index = enabled". Instead we can just call prepare_repo_settings() once, and set the variable to "enabled" in the first place. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-05-06 12:53:46 +09:00
Derrick Stolee	71f82d032f	sparse-index: expand_to_path() Some users of the index API have a specific path they are looking for, but choose to use index_file_exists() to rely on the name-hash hashtable instead of doing binary search with index_name_pos(). These users only need to know a yes/no answer, not a position within the cache array. When the index is sparse, the name-hash hash table does not contain the full list of paths within sparse directories. It _does_ contain the directory names for the sparse-directory entries. Create a helper function, expand_to_path(), for intended use with the name-hash hashtable functions. The integration with name-hash.c will follow in a later change. The solution here is to use ensure_full_index() when we determine that the requested path is within a sparse directory entry. This will populate the name-hash hashtable as the index is recomputed from scratch. There may be cases where the caller is trying to find an untracked path that is not in the index but also is not within a sparse directory entry. We want to minimize the overhead for these requests. If we used index_name_pos() to find the insertion order of the path, then we could determine from that position if a sparse-directory exists. (In fact, just calling index_name_pos() in that case would lead to expanding the index to a full index.) However, this takes O(log N) time where N is the number of cache entries. To keep the performance of this call based mostly on the input string, use index_file_exists() to look for the ancestors of the path. Using the heuristic that a sparse directory is likely to have a small number of parent directories, we start from the bottom and build up. Use a string buffer to allow mutating the path name to terminate after each slash for each hashset test. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-14 13:47:54 -07:00
Derrick Stolee	2de37c536d	cache-tree: integrate with sparse directory entries The cache-tree extension was previously disabled with sparse indexes. However, the cache-tree is an important performance feature for commands like 'git status' and 'git add'. Integrate it with sparse directory entries. When writing a sparse index, completely clear and recalculate the cache tree. By starting from scratch, the only integration necessary is to check if we hit a sparse directory entry and create a leaf of the cache-tree that has an entry_count of one and no subtrees. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-30 12:57:48 -07:00
Derrick Stolee	122ba1f7b5	sparse-checkout: toggle sparse index from builtin The sparse index extension is used to signal that index writes should be in sparse mode. This was only updated using GIT_TEST_SPARSE_INDEX=1. Add a '--[no-]sparse-index' option to 'git sparse-checkout init' that specifies if the sparse index should be used. It also updates the index to use the correct format, either way. Add a warning in the documentation that the use of a repository extension might reduce compatibility with third-party tools. 'git sparse-checkout init' already sets extension.worktreeConfig, which places most sparse-checkout users outside of the scope of most third-party tools. Update t1092-sparse-checkout-compatibility.sh to use this CLI instead of GIT_TEST_SPARSE_INDEX=1. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-30 12:57:48 -07:00
Derrick Stolee	58300f4743	sparse-index: add index.sparse config option When enabled, this config option signals that index writes should attempt to use sparse-directory entries. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-30 12:57:47 -07:00
Derrick Stolee	f442313e2e	submodule: sparse-index should not collapse links A submodule is stored as a "Git link" that actually points to a commit within a submodule. Submodules are populated or not depending on submodule configuration, not sparse-checkout. To ensure that the sparse-index feature integrates correctly with submodules, we should not collapse a directory if there is a Git link within its range. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-30 12:57:47 -07:00
Derrick Stolee	6e773527b6	sparse-index: convert from full to sparse If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-30 12:57:47 -07:00
Derrick Stolee	4300f8442a	sparse-index: implement ensure_full_index() We will mark an in-memory index_state as having sparse directory entries with the sparse_index bit. These currently cannot exist, but we will add a mechanism for collapsing a full index to a sparse one in a later change. That will happen at write time, so we must first allow parsing the format before writing it. Commands or methods that require a full index in order to operate can call ensure_full_index() to expand that index in-memory. This requires parsing trees using that index's repository. Sparse directory entries have a specific 'ce_mode' value. The macro S_ISSPARSEDIR(ce->ce_mode) can check if a cache_entry 'ce' has this type. This ce_mode is not possible with the existing index formats, so we don't also verify all properties of a sparse-directory entry, which are: 1. ce->ce_mode == 0040000 2. ce->flags & CE_SKIP_WORKTREE is true 3. ce->name[ce->namelen - 1] == '/' (ends in dir separator) 4. ce->oid references a tree object. These are all semi-enforced in ensure_full_index() to some extent. Any deviation will cause a warning at minimum or a failure in the worst case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-30 12:57:45 -07:00
Derrick Stolee	3964fc2aae	sparse-index: add guard to ensure full index Upcoming changes will introduce modifications to the index format that allow sparse directories. It will be useful to have a mechanism for converting those sparse index files into full indexes by walking the tree at those sparse directories. Name this method ensure_full_index() as it will guarantee that the index is fully expanded. This method is not implemented yet, and instead we focus on the scaffolding to declare it and call it at the appropriate time. Add a 'command_requires_full_index' member to struct repo_settings. This will be an indicator that we need the index in full mode to do certain index operations. This starts as being true for every command, then we will set it to false as some commands integrate with sparse indexes. If 'command_requires_full_index' is true, then we will immediately expand a sparse index to a full one upon reading from disk. This suffices for now, but we will want to add more callers to ensure_full_index() later. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-30 12:57:45 -07:00

49 commits