git/t/t1092-sparse-checkout-compatibility.sh

2335 lines
72 KiB
Bash
Raw Normal View History

#!/bin/sh
test_description='compare full workdir to sparse workdir'
sparse-index: convert from full to sparse If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 13:10:55 +00:00
GIT_TEST_SPLIT_INDEX=0
GIT_TEST_SPARSE_INDEX=
sparse-index: convert from full to sparse If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 13:10:55 +00:00
. ./test-lib.sh
test_expect_success 'setup' '
git init initial-repo &&
(
GIT_TEST_SPARSE_INDEX=0 &&
cd initial-repo &&
echo a >a &&
echo "after deep" >e &&
echo "after folder1" >g &&
echo "after x" >z &&
mkdir folder1 folder2 deep before x &&
echo "before deep" >before/a &&
echo "before deep again" >before/b &&
mkdir deep/deeper1 deep/deeper2 deep/before deep/later &&
mkdir deep/deeper1/deepest &&
mkdir deep/deeper1/deepest2 &&
mkdir deep/deeper1/deepest3 &&
echo "after deeper1" >deep/e &&
echo "after deepest" >deep/deeper1/e &&
cp a folder1 &&
cp a folder2 &&
cp a x &&
cp a deep &&
cp a deep/before &&
cp a deep/deeper1 &&
cp a deep/deeper2 &&
cp a deep/later &&
cp a deep/deeper1/deepest &&
cp a deep/deeper1/deepest2 &&
cp a deep/deeper1/deepest3 &&
cp -r deep/deeper1/ deep/deeper2 &&
mkdir deep/deeper1/0 &&
mkdir deep/deeper1/0/0 &&
touch deep/deeper1/0/1 &&
touch deep/deeper1/0/0/0 &&
>folder1- &&
>folder1.x &&
>folder10 &&
cp -r deep/deeper1/0 folder1 &&
cp -r deep/deeper1/0 folder2 &&
echo >>folder1/0/0/0 &&
echo >>folder2/0/1 &&
git add . &&
git commit -m "initial commit" &&
git checkout -b base &&
for dir in folder1 folder2 deep
do
git checkout -b update-$dir base &&
echo "updated $dir" >$dir/a &&
git commit -a -m "update $dir" || return 1
done &&
git checkout -b rename-base base &&
cat >folder1/larger-content <<-\EOF &&
matching
lines
help
inexact
renames
EOF
cp folder1/larger-content folder2/ &&
cp folder1/larger-content deep/deeper1/ &&
git add . &&
git commit -m "add interesting rename content" &&
git checkout -b rename-out-to-out rename-base &&
mv folder1/a folder2/b &&
mv folder1/larger-content folder2/edited-content &&
echo >>folder2/edited-content &&
echo >>folder2/0/1 &&
echo stuff >>deep/deeper1/a &&
git add . &&
git commit -m "rename folder1/... to folder2/..." &&
git checkout -b rename-out-to-in rename-base &&
mv folder1/a deep/deeper1/b &&
echo more stuff >>deep/deeper1/a &&
rm folder2/0/1 &&
mkdir folder2/0/1 &&
echo >>folder2/0/1/1 &&
mv folder1/larger-content deep/deeper1/edited-content &&
echo >>deep/deeper1/edited-content &&
git add . &&
git commit -m "rename folder1/... to deep/deeper1/..." &&
git checkout -b rename-in-to-out rename-base &&
mv deep/deeper1/a folder1/b &&
echo >>folder2/0/1 &&
rm -rf folder1/0/0 &&
echo >>folder1/0/0 &&
mv deep/deeper1/larger-content folder1/edited-content &&
echo >>folder1/edited-content &&
git add . &&
git commit -m "rename deep/deeper1/... to folder1/..." &&
t1092: document bad 'git checkout' behavior Add new branches to the test repo that demonstrate directory/file conflicts in different ways. Since the directory 'folder1/' has adjacent files 'folder1-', 'folder1.txt', and 'folder10' it causes searches for 'folder1/' to land in a different place in the index than a search for 'folder1'. This causes a change in behavior when working with the df-conflict-1 and df-conflict-2 branches, whose only difference is that the first uses 'folder1' as the conflict and the other uses 'folder2' which does not have these adjacent files. We can extend two tests that compare the behavior across different 'git checkout' commands, and we see already that the behavior will be different in some cases and not in others. The difference between the two test loops is that one uses 'git reset --hard' between iterations. Further, we isolate the behavior of creating a staged change within a directory and then checking out a branch where that directory is replaced with a file. A full checkout behaves differently across these two cases, while a sparse-checkout cone behaves consistently. In both cases, the behavior is wrong. In one case, the staged change is dropped entirely. The other case the staged change is kept, replacing the file at that location, but none of the other files in the directory are kept. Likely, the correct behavior in this case is to reject the checkout and report the conflict, leaving HEAD in its previous location. None of the cases behave this way currently. Use comments to demonstrate that the tested behavior is only a documentation of the current, incorrect behavior to ensure we do not _accidentally_ change it. Instead, we would prefer to change it on purpose with a future change. At this point, the sparse-index does not handle these 'git checkout' commands correctly. Or rather, it _does_ reject the 'git checkout' when we have the staged change, but for the wrong reason. It also rejects the 'git checkout' commands when there is no staged change and we want to replace a directory with a file. A fix for that unstaged case will follow in the next change, but that will make the sparse-index agree with the full checkout case in these documented incorrect behaviors. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 20:14:40 +00:00
git checkout -b df-conflict-1 base &&
rm -rf folder1 &&
echo content >folder1 &&
git add . &&
git commit -m "dir to file" &&
git checkout -b df-conflict-2 base &&
rm -rf folder2 &&
echo content >folder2 &&
git add . &&
git commit -m "dir to file" &&
git checkout -b fd-conflict base &&
rm a &&
mkdir a &&
echo content >a/a &&
git add . &&
git commit -m "file to dir" &&
for side in left right
do
git checkout -b merge-$side base &&
echo $side >>deep/deeper2/a &&
echo $side >>folder1/a &&
echo $side >>folder2/a &&
git add . &&
git commit -m "$side" || return 1
done &&
git checkout -b deepest base &&
echo "updated deepest" >deep/deeper1/deepest/a &&
echo "updated deepest2" >deep/deeper1/deepest2/a &&
echo "updated deepest3" >deep/deeper1/deepest3/a &&
git commit -a -m "update deepest" &&
git checkout -f base &&
git reset --hard
)
'
init_repos () {
rm -rf full-checkout sparse-checkout sparse-index &&
# create repos in initial state
cp -r initial-repo full-checkout &&
git -C full-checkout reset --hard &&
cp -r initial-repo sparse-checkout &&
git -C sparse-checkout reset --hard &&
cp -r initial-repo sparse-index &&
git -C sparse-index reset --hard &&
# initialize sparse-checkout definitions
git -C sparse-checkout sparse-checkout init --cone &&
git -C sparse-checkout sparse-checkout set deep &&
git -C sparse-index sparse-checkout init --cone --sparse-index &&
test_cmp_config -C sparse-index true index.sparse &&
git -C sparse-index sparse-checkout set deep
}
builtin/grep.c: integrate with sparse index Turn on sparse index and remove ensure_full_index(). Before this patch, `git-grep` utilizes the ensure_full_index() method to expand the index and search all the entries. Because this method requires walking all the trees and constructing the index, it is the slow part within the whole command. To achieve better performance, this patch uses grep_tree() to search the sparse directory entries and get rid of the ensure_full_index() method. Why grep_tree() is a better choice over ensure_full_index()? 1) grep_tree() is as correct as ensure_full_index(). grep_tree() looks into every sparse-directory entry (represented by a tree) recursively when looping over the index, and the result of doing so matches the result of expanding the index. 2) grep_tree() utilizes pathspecs to limit the scope of searching. ensure_full_index() always expands the index, which means it will always walk all the trees and blobs in the repo without caring if the user only wants a subset of the content, i.e. using a pathspec. On the other hand, grep_tree() will only search the contents that match the pathspec, and thus possibly walking fewer trees. 3) grep_tree() does not construct and copy back a new index, while ensure_full_index() does. This also saves some time. ---------------- Performance test - Summary: p2000 tests demonstrate a ~71% execution time reduction for `git grep --cached bogus -- "f2/f1/f1/*"` using tree-walking logic. However, notice that this result varies depending on the pathspec given. See below "Command used for testing" for more details. Test HEAD~ HEAD ------------------------------------------------------- 2000.78: git grep ... (full-v3) 0.35 0.39 (≈) 2000.79: git grep ... (full-v4) 0.36 0.30 (≈) 2000.80: git grep ... (sparse-v3) 0.88 0.23 (-73.8%) 2000.81: git grep ... (sparse-v4) 0.83 0.26 (-68.6%) - Command used for testing: git grep --cached bogus -- "f2/f1/f1/*" The reason for specifying a pathspec is that, if we don't specify a pathspec, then grep_tree() will walk all the trees and blobs to find the pattern, and the time consumed doing so is not too different from using the original ensure_full_index() method, which also spends most of the time walking trees. However, when a pathspec is specified, this latest logic will only walk the area of trees enclosed by the pathspec, and the time consumed is reasonably a lot less. Generally speaking, because the performance gain is acheived by walking less trees, which are specified by the pathspec, the HEAD time v.s. HEAD~ time in sparse-v[3|4], should be proportional to "pathspec enclosed area" v.s. "all area", respectively. Namely, the wider the <pathspec> is encompassing, the less the performance difference between HEAD~ and HEAD, and vice versa. That is, if we don't specify a pathspec, the performance difference [1] is indistinguishable: both methods walk all the trees and take generally same amount of time (even with the index construction time included for ensure_full_index()). [1] Performance test result without pathspec (hence walking all trees): Command used: git grep --cached bogus Test HEAD~ HEAD --------------------------------------------------- 2000.78: git grep ... (full-v3) 6.17 5.19 (≈) 2000.79: git grep ... (full-v4) 6.19 5.46 (≈) 2000.80: git grep ... (sparse-v3) 6.57 6.44 (≈) 2000.81: git grep ... (sparse-v4) 6.65 6.28 (≈) -------------------------- NEEDSWORK about submodules There are a few NEEDSWORKs that belong to improvements beyond this topic. See the NEEDSWORK in builtin/grep.c::grep_submodule() for more context. The other two NEEDSWORKs in t1092 are also relative. Suggested-by: Derrick Stolee <derrickstolee@github.com> Helped-by: Derrick Stolee <derrickstolee@github.com> Helped-by: Victoria Dye <vdye@github.com> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-23 04:18:42 +00:00
init_repos_as_submodules () {
git reset --hard &&
init_repos &&
git submodule add ./full-checkout &&
git submodule add ./sparse-checkout &&
git submodule add ./sparse-index &&
git submodule status >actual &&
grep full-checkout actual &&
grep sparse-checkout actual &&
grep sparse-index actual
}
run_on_sparse () {
(
cd sparse-checkout &&
GIT_PROGRESS_DELAY=100000 "$@" >../sparse-checkout-out 2>../sparse-checkout-err
) &&
(
cd sparse-index &&
GIT_PROGRESS_DELAY=100000 "$@" >../sparse-index-out 2>../sparse-index-err
)
}
run_on_all () {
(
cd full-checkout &&
GIT_PROGRESS_DELAY=100000 "$@" >../full-checkout-out 2>../full-checkout-err
) &&
run_on_sparse "$@"
}
test_all_match () {
run_on_all "$@" &&
test_cmp full-checkout-out sparse-checkout-out &&
sparse-index: convert from full to sparse If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 13:10:55 +00:00
test_cmp full-checkout-out sparse-index-out &&
test_cmp full-checkout-err sparse-checkout-err &&
test_cmp full-checkout-err sparse-index-err
}
test_sparse_match () {
run_on_sparse "$@" &&
test_cmp sparse-checkout-out sparse-index-out &&
test_cmp sparse-checkout-err sparse-index-err
}
test_sparse_unstaged () {
file=$1 &&
for repo in sparse-checkout sparse-index
do
# Skip "unmerged" paths
git -C $repo diff --staged --diff-filter=u -- "$file" >diff &&
test_must_be_empty diff || return 1
done
}
# Usage: test_sprase_checkout_set "<c1> ... <cN>" "<s1> ... <sM>"
# Verifies that "git sparse-checkout set <c1> ... <cN>" succeeds and
# leaves the sparse index in a state where <s1> ... <sM> are sparse
# directories (and <c1> ... <cN> are not).
test_sparse_checkout_set () {
CONE_DIRS=$1 &&
SPARSE_DIRS=$2 &&
git -C sparse-index sparse-checkout set --skip-checks $CONE_DIRS &&
git -C sparse-index ls-files --sparse --stage >cache &&
# Check that the directories outside of the sparse-checkout cone
# have sparse directory entries.
for dir in $SPARSE_DIRS
sparse-index: convert from full to sparse If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 13:10:55 +00:00
do
TREE=$(git -C sparse-index rev-parse HEAD:$dir) &&
grep "040000 $TREE 0 $dir/" cache \
sparse-index: convert from full to sparse If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 13:10:55 +00:00
|| return 1
done &&
# Check that the directories in the sparse-checkout cone
# are not sparse directory entries.
for dir in $CONE_DIRS
sparse-index: convert from full to sparse If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 13:10:55 +00:00
do
# Allow TREE to not exist because
# $dir does not exist at HEAD.
TREE=$(git -C sparse-index rev-parse HEAD:$dir) ||
! grep "040000 $TREE 0 $dir/" cache \
sparse-index: convert from full to sparse If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 13:10:55 +00:00
|| return 1
done
}
sparse-index: convert from full to sparse If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 13:10:55 +00:00
test_expect_success 'sparse-index contents' '
init_repos &&
sparse-index: convert from full to sparse If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 13:10:55 +00:00
# Remove deep, add three other directories.
test_sparse_checkout_set \
"folder1 folder2 x" \
"before deep" &&
# Remove folder1, add deep
test_sparse_checkout_set \
"deep folder2 x" \
"before folder1" &&
# Replace deep with deep/deeper2 (dropping deep/deeper1)
# Add folder1
test_sparse_checkout_set \
"deep/deeper2 folder1 folder2 x" \
"before deep/deeper1" &&
# Replace deep/deeper2 with deep/deeper1
# Replace folder1 with folder1/0/0
# Replace folder2 with non-existent folder2/2/3
# Add non-existent "bogus"
test_sparse_checkout_set \
"bogus deep/deeper1 folder1/0/0 folder2/2/3 x" \
"before deep/deeper2 folder2/0" &&
# Drop down to only files at root
test_sparse_checkout_set \
"" \
"before deep folder1 folder2 x" &&
# Disabling the sparse-index replaces tree entries with full ones
git -C sparse-index sparse-checkout init --no-sparse-index &&
test_sparse_match git ls-files --stage --sparse
sparse-index: convert from full to sparse If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 13:10:55 +00:00
'
test_expect_success 'expanded in-memory index matches full index' '
init_repos &&
test_sparse_match git ls-files --stage
'
sparse-index: prevent repo root from becoming sparse Prevent the repository root from being collapsed into a sparse directory by treating an empty path as "inside the sparse-checkout". When collapsing a sparse index (e.g. in 'git sparse-checkout reapply'), the root directory typically could not become a sparse directory due to the presence of in-cone root-level files and directories. However, if no such in-cone files or directories were present, there was no explicit check signaling that the "repository root path" (an empty string, in the case of 'convert_to_sparse(...)') was in-cone, and a sparse directory index entry would be created from the repository root directory. The documentation in Documentation/git-sparse-checkout.txt explicitly states that the files in the root directory are expected to be in-cone for a cone-mode sparse-checkout. Collapsing the root into a sparse directory entry violates that assumption, as sparse directory entries are expected to be outside the sparse cone and have SKIP_WORKTREE enabled. This invalid state in turn causes issues with commands that interact with the index, e.g. 'git status'. Treating an empty (root) path as in-cone prevents the creation of a root sparse directory in 'convert_to_sparse(...)'. Because the repository root is otherwise never compared with sparse patterns (in both cone-mode and non-cone sparse-checkouts), the new check does not cause additional changes to how sparse patterns are applied. Helped-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 20:24:24 +00:00
test_expect_success 'root directory cannot be sparse' '
init_repos &&
# Remove all in-cone files and directories from the index, collapse index
# with `git sparse-checkout reapply`
git -C sparse-index rm -r . &&
git -C sparse-index sparse-checkout reapply &&
# Verify sparse directories still present, root directory is not sparse
cat >expect <<-EOF &&
before/
sparse-index: prevent repo root from becoming sparse Prevent the repository root from being collapsed into a sparse directory by treating an empty path as "inside the sparse-checkout". When collapsing a sparse index (e.g. in 'git sparse-checkout reapply'), the root directory typically could not become a sparse directory due to the presence of in-cone root-level files and directories. However, if no such in-cone files or directories were present, there was no explicit check signaling that the "repository root path" (an empty string, in the case of 'convert_to_sparse(...)') was in-cone, and a sparse directory index entry would be created from the repository root directory. The documentation in Documentation/git-sparse-checkout.txt explicitly states that the files in the root directory are expected to be in-cone for a cone-mode sparse-checkout. Collapsing the root into a sparse directory entry violates that assumption, as sparse directory entries are expected to be outside the sparse cone and have SKIP_WORKTREE enabled. This invalid state in turn causes issues with commands that interact with the index, e.g. 'git status'. Treating an empty (root) path as in-cone prevents the creation of a root sparse directory in 'convert_to_sparse(...)'. Because the repository root is otherwise never compared with sparse patterns (in both cone-mode and non-cone sparse-checkouts), the new check does not cause additional changes to how sparse patterns are applied. Helped-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 20:24:24 +00:00
folder1/
folder2/
x/
EOF
git -C sparse-index ls-files --sparse >actual &&
test_cmp expect actual
'
test_expect_success 'status with options' '
init_repos &&
sparse-index: convert from full to sparse If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 13:10:55 +00:00
test_sparse_match ls &&
test_all_match git status --porcelain=v2 &&
test_all_match git status --porcelain=v2 -z -u &&
test_all_match git status --porcelain=v2 -uno &&
run_on_all touch README.md &&
test_all_match git status --porcelain=v2 &&
test_all_match git status --porcelain=v2 -z -u &&
test_all_match git status --porcelain=v2 -uno &&
test_all_match git add README.md &&
test_all_match git status --porcelain=v2 &&
test_all_match git status --porcelain=v2 -z -u &&
test_all_match git status --porcelain=v2 -uno
'
status: fix nested sparse directory diff in sparse index Enable the 'recursive' diff option for the diff executed as part of 'git status'. Without the 'recursive' enabled, 'git status' reports index changes incorrectly when the following conditions were met: * sparse index is enabled * there is a difference between the index and HEAD in a file inside a *subdirectory* of a sparse directory * the sparse directory index entry is *not* expanded in-core Because it is not recursive by default, the diff in 'git status' reports changes only at the level of files and directories that are immediate children of a sparse directory, rather than recursing into directories with changes to identify the modified file(s). As a result, 'git status' reports the immediate subdirectory itself as "modified". Example: $ git init $ mkdir -p sparse/sub $ echo test >sparse/sub/foo $ git add . $ git commit -m "commit 1" $ echo somethingelse >sparse/sub/foo $ git add . $ git commit -a -m "commit 2" $ git sparse-checkout set --cone --sparse-index 'sparse' $ git reset --soft HEAD~1 $ git status On branch master You are in a sparse checkout. Changes to be committed: (use "git restore --staged <file>..." to unstage) modified: sparse/sub Enabling the 'recursive' diff option in 'wt_status_collect_changes_index' corrects this issue by allowing the diff to recurse into subdirectories of sparse directories to find modified files. Given the same repository setup as the example above, the corrected result of `git status` is: $ git status On branch master You are in a sparse checkout. Changes to be committed: (use "git restore --staged <file>..." to unstage) modified: sparse/sub/foo Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 20:24:25 +00:00
test_expect_success 'status with diff in unexpanded sparse directory' '
init_repos &&
test_all_match git checkout rename-base &&
test_all_match git reset --soft rename-out-to-out &&
test_all_match git status --porcelain=v2
'
test_expect_success 'status reports sparse-checkout' '
init_repos &&
git -C sparse-checkout status >full &&
git -C sparse-index status >sparse &&
test_grep "You are in a sparse checkout with " full &&
test_grep "You are in a sparse checkout." sparse
'
test_expect_success 'add, commit, checkout' '
init_repos &&
write_script edit-contents <<-\EOF &&
echo text >>$1
EOF
run_on_all ../edit-contents README.md &&
test_all_match git add README.md &&
test_all_match git status --porcelain=v2 &&
test_all_match git commit -m "Add README.md" &&
test_all_match git checkout HEAD~1 &&
test_all_match git checkout - &&
run_on_all ../edit-contents README.md &&
test_all_match git add -A &&
test_all_match git status --porcelain=v2 &&
test_all_match git commit -m "Extend README.md" &&
test_all_match git checkout HEAD~1 &&
test_all_match git checkout - &&
run_on_all ../edit-contents deep/newfile &&
test_all_match git status --porcelain=v2 -uno &&
test_all_match git status --porcelain=v2 &&
test_all_match git add . &&
test_all_match git status --porcelain=v2 &&
test_all_match git commit -m "add deep/newfile" &&
test_all_match git checkout HEAD~1 &&
test_all_match git checkout -
'
unpack-trees: use traverse_path instead of name The sparse_dir_matches_path() method compares a cache entry that is a sparse directory entry against a 'struct traverse_info *info' and a 'struct name_entry *p' to see if the cache entry has exactly the right name for those other inputs. This method was introduced in 523506d (unpack-trees: unpack sparse directory entries, 2021-07-14), but included a significant mistake. The path comparisons used 'info->name' instead of 'info->traverse_path'. Since 'info->name' only stores a single tree entry name while 'info->traverse_path' stores the full path from root, this method does not work when 'info' is in a subdirectory of a directory. Replacing the right strings and their corresponding lengths make the method work properly. The previous change included a failing test that exposes this issue. That test now passes. The critical detail is that as we go deep into unpack_trees(), the logic for merging a sparse directory entry with a tree entry during 'git checkout' relies on this sparse_dir_matches_path() in order to avoid calling traverse_trees_recursive() during unpack_callback() in this hunk: if (!is_sparse_directory_entry(src[0], names, info) && traverse_trees_recursive(n, dirmask, mask & ~dirmask, names, info) < 0) { return -1; } For deep paths, the short-circuit never occurred and traverse_trees_recursive() was being called incorrectly and that was causing other strange issues. Specifically, the error message from the now-passing test previously included this: error: Your local changes to the following files would be overwritten by checkout: deep/deeper1/deepest2/a deep/deeper1/deepest3/a Please commit your changes or stash them before you switch branches. Aborting These messages occurred because the 'current' cache entry in twoway_merge() was showing as NULL because the index did not contain entries for the paths contained within the sparse directory entries. We instead had 'oldtree' given as the entry at HEAD and 'newtree' as the entry in the target tree. This led to reject_merge() listing these paths. Now that sparse_dir_matches_path() works the same for deep paths as it does for shallow depths, the rest of the logic kicks in to properly handle modifying the sparse directory entries as designed. Reported-by: Gustave Granroth <gus.gran@gmail.com> Reported-by: Mike Marcelais <michmarc@exchange.microsoft.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-06 14:10:37 +00:00
test_expect_success 'deep changes during checkout' '
init_repos &&
test_sparse_match git sparse-checkout set deep/deeper1/deepest &&
test_all_match git checkout deepest &&
test_all_match git checkout base
'
test_expect_success 'checkout with modified sparse directory' '
init_repos &&
test_all_match git checkout rename-in-to-out -- . &&
test_sparse_match git sparse-checkout reapply &&
test_all_match git checkout base
'
test_expect_success 'checkout orphan then non-orphan' '
init_repos &&
test_all_match git checkout --orphan test-orphan &&
test_all_match git status --porcelain=v2 &&
test_all_match git checkout base &&
test_all_match git status --porcelain=v2
'
unpack-trees: increment cache_bottom for sparse directories Correct tracking of the 'cache_bottom' for cases where sparse directories are present in the index. BACKGROUND ---------- The 'unpack_trees_options.cache_bottom' is a variable that tracks the in-progress "bottom" of the cache as 'unpack_trees()' iterates through the contents of the index. Most importantly, this value informs the sequential return values of 'next_cache_entry()' which, in the "diff cache" usage of 'unpack_callback()', are either unpacked as-is or are passed into the diff machinery. The 'cache_bottom' is intended to track the position of the first entry in the index that has not yet been diffed or unpacked. It is advanced in two main ways: either it is incremented when an index entry is marked as "used" (in 'mark_ce_used()'), indicating that it was unpacked or diffed, or when a directory is unpacked, in which case it is increased by an amount equaling the number of index entries inside that tree. In 17a1bb570b (unpack-trees: preserve cache_bottom, 2021-07-14), it was identified that sparse directories posed a problem to the above 'cache_bottom' advancement logic - because a sparse directory was both an index entry that could be "used" and a directory that can be unpacked, the 'cache_bottom' would be incremented too many times. To solve this problem, the 'mark_ce_used()' advancement of 'cache_bottom' was skipped for sparse directories. INCORRECT CACHE_BOTTOM TRACKING ------------------------------- Skipping the 'cache_bottom' advancement for sparse directories in 'mark_ce_used()' breaks down in two cases: 1. When the 'unpack_trees()' operation is *not* a "cache diff" (because the directory contents-based incrementing of 'cache_bottom' does not happen). 2. When a cache diff is performed with a pathspec (because 'unpack_index_entry()' will unpack a sparse directory not matched by the pathspec without performing the directory contents-based increment). The former luckily does not appear to affect 'git' behavior, likely because 'cache_bottom' is largely unused (non-"cache diff" 'unpack_trees()' uses 'find_index_entry()' - rather than 'next_cache_entry()' - to find the index entries to unpack). The latter, however, causes 'cache_bottom' to "lag behind" its intended position by an amount equal to the number of sparse directories unpacked so far with 'unpack_index_entry()'. If a repository is structured such that any sparse directories are ordered lexicographically *after* any pathspec-matching directories, though, this issue won't present any adverse behavior. This was the case with the 't1092-sparse-checkout-compatibility.sh' tests before the addition of the 'before/' sparse directory (ordered *before* the in-cone 'deep/' directory), therefore sidestepping the issue. Once the 'before/' directory was added, though, 'cache_bottom' began to lag behind its intended position, causing 'next_cache_entry()' to return index entries it had already processed and, ultimately, an incorrect diff. CORRECTING CACHE_BOTTOM ----------------------- The problems observed in 't1092' come from 'cache_bottom' lagging behind in cases where the cache tree-based advancement doesn't occur. To solve this, then, the fix in 17a1bb570b is "reversed"; rather than skipping 'cache_bottom' advancement in 'mark_ce_used()', we skip the directory contents-based advancement for sparse directories. Now, every index entry can be accounted for in 'cache_bottom': * if you're working with a single index entry, 'cache_bottom' is incremented in 'mark_ce_used()' * if you're working with a directory that contains index entries (but is not one itself), 'cache_bottom' is incremented by the number of entries in that directory. Finally, change the 'test_expect_failure' tests in 't1092' failing due to this bug back to 'test_expect_success'. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-17 15:55:35 +00:00
test_expect_success 'add outside sparse cone' '
init_repos &&
run_on_sparse mkdir folder1 &&
run_on_sparse ../edit-contents folder1/a &&
run_on_sparse ../edit-contents folder1/newfile &&
test_sparse_match test_must_fail git add folder1/a &&
grep "Disable or modify the sparsity rules" sparse-checkout-err &&
test_sparse_unstaged folder1/a &&
test_sparse_match test_must_fail git add folder1/newfile &&
grep "Disable or modify the sparsity rules" sparse-checkout-err &&
test_sparse_unstaged folder1/newfile
'
commit: integrate with sparse-index Update 'git commit' to allow using the sparse-index in memory without expanding to a full one. The only place that had an ensure_full_index() call was in cache_tree_update(). The recursive algorithm for update_one() was already updated in 2de37c536 (cache-tree: integrate with sparse directory entries, 2021-03-03) to handle sparse directory entries in the index. Most of this change involves testing different command-line options that allow specifying which on-disk changes should be included in the commit. This includes no options (only take currently-staged changes), -a (take all tracked changes), and --include (take a list of specific changes). To simplify testing that these options do not expand the index, update the test that previously verified that 'git status' does not expand the index with a helper method, ensure_not_expanded(). This allows 'git commit' to operate much faster when the sparse-checkout cone is much smaller than the full list of files at HEAD. Here are the relevant lines from p2000-sparse-operations.sh: Test HEAD~1 HEAD ---------------------------------------------------------------------------------- 2000.14: git commit -a -m A (full-v3) 0.35(0.26+0.06) 0.36(0.28+0.07) +2.9% 2000.15: git commit -a -m A (full-v4) 0.32(0.26+0.05) 0.34(0.28+0.06) +6.3% 2000.16: git commit -a -m A (sparse-v3) 0.63(0.59+0.06) 0.04(0.05+0.05) -93.7% 2000.17: git commit -a -m A (sparse-v4) 0.64(0.59+0.08) 0.04(0.04+0.04) -93.8% It is important to compare the full-index case to the sparse-index case, so the improvement for index version v4 is actually an 88% improvement in this synthetic example. In a real repository with over two million files at HEAD and 60,000 files in the sparse-checkout definition, the time for 'git commit -a' went from 2.61 seconds to 134ms. I compared this to the result if the index only contained the paths in the sparse-checkout definition and found the theoretical optimum to be 120ms, so the out-of-cone paths only add a 12% overhead. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-29 02:13:04 +00:00
test_expect_success 'commit including unstaged changes' '
init_repos &&
write_script edit-file <<-\EOF &&
echo $1 >$2
EOF
run_on_all ../edit-file 1 a &&
run_on_all ../edit-file 1 deep/a &&
test_all_match git commit -m "-a" -a &&
test_all_match git status --porcelain=v2 &&
run_on_all ../edit-file 2 a &&
run_on_all ../edit-file 2 deep/a &&
test_all_match git commit -m "--include" --include deep/a &&
test_all_match git status --porcelain=v2 &&
test_all_match git commit -m "--include" --include a &&
test_all_match git status --porcelain=v2 &&
run_on_all ../edit-file 3 a &&
run_on_all ../edit-file 3 deep/a &&
test_all_match git commit -m "--amend" -a --amend &&
test_all_match git status --porcelain=v2
'
unpack-trees: increment cache_bottom for sparse directories Correct tracking of the 'cache_bottom' for cases where sparse directories are present in the index. BACKGROUND ---------- The 'unpack_trees_options.cache_bottom' is a variable that tracks the in-progress "bottom" of the cache as 'unpack_trees()' iterates through the contents of the index. Most importantly, this value informs the sequential return values of 'next_cache_entry()' which, in the "diff cache" usage of 'unpack_callback()', are either unpacked as-is or are passed into the diff machinery. The 'cache_bottom' is intended to track the position of the first entry in the index that has not yet been diffed or unpacked. It is advanced in two main ways: either it is incremented when an index entry is marked as "used" (in 'mark_ce_used()'), indicating that it was unpacked or diffed, or when a directory is unpacked, in which case it is increased by an amount equaling the number of index entries inside that tree. In 17a1bb570b (unpack-trees: preserve cache_bottom, 2021-07-14), it was identified that sparse directories posed a problem to the above 'cache_bottom' advancement logic - because a sparse directory was both an index entry that could be "used" and a directory that can be unpacked, the 'cache_bottom' would be incremented too many times. To solve this problem, the 'mark_ce_used()' advancement of 'cache_bottom' was skipped for sparse directories. INCORRECT CACHE_BOTTOM TRACKING ------------------------------- Skipping the 'cache_bottom' advancement for sparse directories in 'mark_ce_used()' breaks down in two cases: 1. When the 'unpack_trees()' operation is *not* a "cache diff" (because the directory contents-based incrementing of 'cache_bottom' does not happen). 2. When a cache diff is performed with a pathspec (because 'unpack_index_entry()' will unpack a sparse directory not matched by the pathspec without performing the directory contents-based increment). The former luckily does not appear to affect 'git' behavior, likely because 'cache_bottom' is largely unused (non-"cache diff" 'unpack_trees()' uses 'find_index_entry()' - rather than 'next_cache_entry()' - to find the index entries to unpack). The latter, however, causes 'cache_bottom' to "lag behind" its intended position by an amount equal to the number of sparse directories unpacked so far with 'unpack_index_entry()'. If a repository is structured such that any sparse directories are ordered lexicographically *after* any pathspec-matching directories, though, this issue won't present any adverse behavior. This was the case with the 't1092-sparse-checkout-compatibility.sh' tests before the addition of the 'before/' sparse directory (ordered *before* the in-cone 'deep/' directory), therefore sidestepping the issue. Once the 'before/' directory was added, though, 'cache_bottom' began to lag behind its intended position, causing 'next_cache_entry()' to return index entries it had already processed and, ultimately, an incorrect diff. CORRECTING CACHE_BOTTOM ----------------------- The problems observed in 't1092' come from 'cache_bottom' lagging behind in cases where the cache tree-based advancement doesn't occur. To solve this, then, the fix in 17a1bb570b is "reversed"; rather than skipping 'cache_bottom' advancement in 'mark_ce_used()', we skip the directory contents-based advancement for sparse directories. Now, every index entry can be accounted for in 'cache_bottom': * if you're working with a single index entry, 'cache_bottom' is incremented in 'mark_ce_used()' * if you're working with a directory that contains index entries (but is not one itself), 'cache_bottom' is incremented by the number of entries in that directory. Finally, change the 'test_expect_failure' tests in 't1092' failing due to this bug back to 'test_expect_success'. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-17 15:55:35 +00:00
test_expect_success 'status/add: outside sparse cone' '
init_repos &&
# folder1 is at HEAD, but outside the sparse cone
run_on_sparse mkdir folder1 &&
cp initial-repo/folder1/a sparse-checkout/folder1/a &&
cp initial-repo/folder1/a sparse-index/folder1/a &&
test_sparse_match git status &&
write_script edit-contents <<-\EOF &&
echo text >>$1
EOF
repo_read_index: clear SKIP_WORKTREE bit from files present in worktree The fix is short (~30 lines), but the description is not. Sorry. There is a set of problems caused by files in what I'll refer to as the "present-despite-SKIP_WORKTREE" state. This commit aims to not just fix these problems, but remove the entire class as a possibility -- for those using sparse checkouts. But first, we need to understand the problems this class presents. A quick outline: * Problems * User facing issues * Problem space complexity * Maintenance and code correctness challenges * SKIP_WORKTREE expectations in Git * Suggested solution * Pros/Cons of suggested solution * Notes on testcase modifications === User facing issues === There are various ways for users to get files to be present in the working copy despite having the SKIP_WORKTREE bit set for that file in the index. This may come from: * various git commands not really supporting the SKIP_WORKTREE bit[1,2] * users grabbing files from elsewhere and writing them to the worktree (perhaps even cached in their editor) * users attempting to "abort" a sparse-checkout operation with a not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the working tree is not atomic)[3]. Once users have present-despite-SKIP_WORKTREE files, any modifications users make to these files will be ignored, possibly to users' confusion. Further: * these files will degrade performance for the sparse-index case due to requiring the index to be expanded (see commit 55dfcf9591 ("sparse-checkout: clear tracked sparse dirs", 2021-09-08) for why we try to delete entire directories outside the sparse cone). * these files will not be updated by by standard commands (switch/checkout/pull/merge/rebase will leave them alone unless conflicts happen -- and even then, the conflicted file may be written somewhere else to avoid overwriting the SKIP_WORKTREE file that is present and in the way) * there is nothing in Git that users can use to discover such files (status, diff, grep, etc. all ignore it) * there is no reasonable mechanism to "recover" from such a condition (neither `git sparse-checkout reapply` nor `git reset --hard` will correct it). So, not only are users modifications ignored, but the files get progressively more stale over time. At some point in the future, they may change their sparseness specification or disable sparse-checkouts. At that time, all present-despite-SKIP_WORKTREE files will show up as having lots of modifications because they represent a version from a different branch or commit. These might include user-made local changes from days before, but the only way to tell is to have users look through them all closely. If these users come to others for help, there will be no logs that explain the issue; it's just a mysterious list of changes. Users might adamantly claim (correctly, as it turns out) that they didn't modify these files, while others presume they did. [1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ [2] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/ === Problem space complexity === SKIP_WORKTREE has been part of Git for over a decade. Duy did lots of work on it initially, and several others have since come along and put lots of work into it. Stolee spent most of 2021 on the sparse-index, with lots of bugfixes along the way including to non-sparse-index cases as we are still trying to get sparse checkouts to behave reasonably. Basically every codepath throughout the treat needs to be aware of an additional type of file: tracked-but-not-present. The extra type results in lots of extra testcases and lots of extra code everywhere. But, the sad thing is that we actually have more than one extra type. We have tracked, tracked-but-not-present (SKIP_WORKTREE), and tracked-but-promised-to-not-be-present-but-is-present-anyway (present-despite-SKIP_WORKTREE). Two types is a monumental amount of effort to support, and adding a third feels a bit like insanity[4]. [4] Some examples of which can be seen at https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ === Maintenance and code correctness challenges === Matheus' patches to grep stalled for nearly a year, in part because of complications of how to handle sparse-checkouts appropriately in all cases[5][6] (with trying to sanely figure out how to sanely handle present-despite-SKIP_WORKTREE files being one of the complications). His rm/add follow-ups also took months because of those kinds of issues[7]. The corner cases with things like submodules and SKIP_WORKTREE with the addition of present-despite-SKIP_WORKTREE start becoming really complex[8]. We've had to add ugly logic to merge-ort to attempt to handle present-despite-SKIP_WORKTREE files[9], and basically just been forced to give up in merge-recursive knowing full well that we'll sometimes silently discard user modifications. Despite stash essentially being a merge, it needed extra code (beyond what was in merge-ort and merge-recursive) to manually tweak SKIP_WORKTREE bits in order to avoid a few different bugs that'd result in an early abort with a partial stash application[10]. [5] See https://lore.kernel.org/git/5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br/#t and the dates on the thread; also Matheus and I had several conversations off-list trying to resolve the issues over that time [6] ...it finally kind of got unstuck after https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ [7] See for example https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/#t and quotes like "The core functionality of sparse-checkout has always been only partially implemented", a statement I still believe is true today. [8] https://lore.kernel.org/git/pull.809.git.git.1592356884310.gitgitgadget@gmail.com/ [9] See commit 66b209b86a ("merge-ort: implement CE_SKIP_WORKTREE handling with conflicted entries", 2021-03-20) [10] See commit ba359fd507 ("stash: fix stash application in sparse-checkouts", 2020-12-01) === SKIP_WORKTREE expectations in Git === A couple quotes: * From [11] (before the "sparse-checkout" command existed): If it needs too many special cases, hacks, and conditionals, then it is not worth the complexity---if it is easier to write a correct code by allowing Git to populate working tree files, it is perfectly fine to do so. In a sense, the sparse checkout "feature" itself is a hack by itself, and that is why I think this part should be "best effort" as well. * From the git-sparse-checkout manual (still present today): THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE. [11] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ === Suggested solution === SKIP_WORKTREE was written to allow sparse-checkouts, in particular, as the name of the option implies, to allow the file to NOT be in the worktree but consider it to be unchanged rather than deleted. The suggests a simple solution: present-despite-SKIP_WORKTREE files should not exist, for those using sparse-checkouts. Enforce this at index loading time by checking if core.sparseCheckout is true; if so, check files in the index with the SKIP_WORKTREE bit set to verify that they are absent from the working tree. If they are present, unset the bit (in memory, though any commands that write to the index will record the update). Users can, of course, can get the SKIP_WORKTREE bit back such as by running `git sparse-checkout reapply` (if they have ensured the file is unmodified and doesn't match the specified sparsity patterns). === Pros/Cons of suggested solution === Pros: * Solves the user visible problems reported above, which I've been complaining about for nearly a year but couldn't find a solution to. * Helps prevent slow performance degradation with a sparse-index. * Much easier behavior in sparse-checkouts for users to reason about * Very simple, ~30 lines of code. * Significantly simplifies some ugly testcases, and obviates the need to test an entire class of potential issues. * Reduces code complexity, reasoning, and maintenance. Avoids disagreements about weird corner cases[12]. * It has been reported that some users might be (ab)using SKIP_WORKTREE as a let-me-modify-but-keep-the-file-in-the-worktree mechanism[13, and a few other similar references]. These users know of multiple caveats and shortcomings in doing so; perhaps not surprising given the "SKIP_WORKTREE expecations" section above. However, these users use `git update-index --skip-worktree`, and not `git sparse-checkout` or core.sparseCheckout=true. As such, these users would be unaffected by this change and can continue abusing the system as before. [12] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [13] https://stackoverflow.com/questions/13630849/git-difference-between-assume-unchanged-and-skip-worktree Cons: * When core.sparseCheckout is enabled, this adds a performance cost to reading the index. I'll defer discussion of this cost to a subsequent patch, since I have some optimizations to add. === Notes on testcase modifications === The good: * t1011: Compare to two cases above it ('read-tree will not throw away dirty changes, non-sparse'); since the file is present, it should match the non-sparse case now * t1092: sparse-index & sparse-checkout now match full-worktree behavior in more cases! Yaay for consistency! * t6428, t7012: look at how much simpler the tests become! Merge and stash can just fail early telling the user there's a file in the way, instead of not noticing until it's about to write a file and then have to implement sudden crash avoidance. Hurray for sanity! * t7817: sparse behavior better matches full tree behavior. Hurray for sanity! The confusing: * t3705: These changes were ONLY needed on Windows, but they don't hurt other platforms. Let's discuss each individually: * core.sparseCheckout should be false by default. Nothing in this testcase toggles that until many, many tests later. However, early tests (#5 in particular) were testing `update-index --skip-worktree` behavior in a non-sparse-checkout, but the Windows tests in CI were behaving as if core.sparseCheckout=true had been specified somewhere. I do not have access to a Windows machine. But I just manually did what should have been a no-op and turned the config off. And it fixed the test. * I have no idea why the leftover .gitattributes file from this test was causing failures for test #18 on Windows, but only with these changes of mine. Test #18 was checking for empty stderr, and specifically wanted to know that some error completely unrelated to file endings did not appear. The leftover .gitattributes file thus caused some spurious stderr unrelated to the thing being checked. Since other tests did not intend to test normalization, just proactively remove the .gitattributes file. I'm certain this is cleaner and better, I'm just unsure why/how this didn't trigger problems before. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-14 15:59:41 +00:00
run_on_all ../edit-contents folder1/a &&
run_on_all ../edit-contents folder1/new &&
test_sparse_match git status --porcelain=v2 &&
add: allow operating on a sparse-only index Disable command_requires_full_index for 'git add'. This does not require any additional removals of ensure_full_index(). The main reason is that 'git add' discovers changes based on the pathspec and the worktree itself. These are then inserted into the index directly, and calls to index_name_pos() or index_file_exists() already call expand_to_path() at the appropriate time to support a sparse-index. Add a test to check that 'git add -A' and 'git add <file>' does not expand the index at all, as long as <file> is not within a sparse directory. This does not help the global 'git add .' case. We can measure the improvement using p2000-sparse-operations.sh with these results: Test HEAD~1 HEAD ------------------------------------------------------------------------------ 2000.6: git add -A (full-index-v3) 0.35(0.30+0.05) 0.37(0.29+0.06) +5.7% 2000.7: git add -A (full-index-v4) 0.31(0.26+0.06) 0.33(0.27+0.06) +6.5% 2000.8: git add -A (sparse-index-v3) 0.57(0.53+0.07) 0.05(0.04+0.08) -91.2% 2000.9: git add -A (sparse-index-v4) 0.58(0.55+0.06) 0.05(0.05+0.06) -91.4% While the 91% improvement seems impressive, it's important to recognize that previously we had significant overhead for expanding the sparse-index. Comparing to the full index case, 'git add -A' goes from 0.37s to 0.05s, which is "only" an 86% improvement. This modification to 'git add' creates some behavior change depending on the use of a sparse index. We modify a test in t1092 to demonstrate these changes which will be remedied in future changes. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-29 14:52:04 +00:00
# Adding the path outside of the sparse-checkout cone should fail.
test_sparse_match test_must_fail git add folder1/a &&
grep "Disable or modify the sparsity rules" sparse-checkout-err &&
test_sparse_unstaged folder1/a &&
repo_read_index: clear SKIP_WORKTREE bit from files present in worktree The fix is short (~30 lines), but the description is not. Sorry. There is a set of problems caused by files in what I'll refer to as the "present-despite-SKIP_WORKTREE" state. This commit aims to not just fix these problems, but remove the entire class as a possibility -- for those using sparse checkouts. But first, we need to understand the problems this class presents. A quick outline: * Problems * User facing issues * Problem space complexity * Maintenance and code correctness challenges * SKIP_WORKTREE expectations in Git * Suggested solution * Pros/Cons of suggested solution * Notes on testcase modifications === User facing issues === There are various ways for users to get files to be present in the working copy despite having the SKIP_WORKTREE bit set for that file in the index. This may come from: * various git commands not really supporting the SKIP_WORKTREE bit[1,2] * users grabbing files from elsewhere and writing them to the worktree (perhaps even cached in their editor) * users attempting to "abort" a sparse-checkout operation with a not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the working tree is not atomic)[3]. Once users have present-despite-SKIP_WORKTREE files, any modifications users make to these files will be ignored, possibly to users' confusion. Further: * these files will degrade performance for the sparse-index case due to requiring the index to be expanded (see commit 55dfcf9591 ("sparse-checkout: clear tracked sparse dirs", 2021-09-08) for why we try to delete entire directories outside the sparse cone). * these files will not be updated by by standard commands (switch/checkout/pull/merge/rebase will leave them alone unless conflicts happen -- and even then, the conflicted file may be written somewhere else to avoid overwriting the SKIP_WORKTREE file that is present and in the way) * there is nothing in Git that users can use to discover such files (status, diff, grep, etc. all ignore it) * there is no reasonable mechanism to "recover" from such a condition (neither `git sparse-checkout reapply` nor `git reset --hard` will correct it). So, not only are users modifications ignored, but the files get progressively more stale over time. At some point in the future, they may change their sparseness specification or disable sparse-checkouts. At that time, all present-despite-SKIP_WORKTREE files will show up as having lots of modifications because they represent a version from a different branch or commit. These might include user-made local changes from days before, but the only way to tell is to have users look through them all closely. If these users come to others for help, there will be no logs that explain the issue; it's just a mysterious list of changes. Users might adamantly claim (correctly, as it turns out) that they didn't modify these files, while others presume they did. [1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ [2] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/ === Problem space complexity === SKIP_WORKTREE has been part of Git for over a decade. Duy did lots of work on it initially, and several others have since come along and put lots of work into it. Stolee spent most of 2021 on the sparse-index, with lots of bugfixes along the way including to non-sparse-index cases as we are still trying to get sparse checkouts to behave reasonably. Basically every codepath throughout the treat needs to be aware of an additional type of file: tracked-but-not-present. The extra type results in lots of extra testcases and lots of extra code everywhere. But, the sad thing is that we actually have more than one extra type. We have tracked, tracked-but-not-present (SKIP_WORKTREE), and tracked-but-promised-to-not-be-present-but-is-present-anyway (present-despite-SKIP_WORKTREE). Two types is a monumental amount of effort to support, and adding a third feels a bit like insanity[4]. [4] Some examples of which can be seen at https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ === Maintenance and code correctness challenges === Matheus' patches to grep stalled for nearly a year, in part because of complications of how to handle sparse-checkouts appropriately in all cases[5][6] (with trying to sanely figure out how to sanely handle present-despite-SKIP_WORKTREE files being one of the complications). His rm/add follow-ups also took months because of those kinds of issues[7]. The corner cases with things like submodules and SKIP_WORKTREE with the addition of present-despite-SKIP_WORKTREE start becoming really complex[8]. We've had to add ugly logic to merge-ort to attempt to handle present-despite-SKIP_WORKTREE files[9], and basically just been forced to give up in merge-recursive knowing full well that we'll sometimes silently discard user modifications. Despite stash essentially being a merge, it needed extra code (beyond what was in merge-ort and merge-recursive) to manually tweak SKIP_WORKTREE bits in order to avoid a few different bugs that'd result in an early abort with a partial stash application[10]. [5] See https://lore.kernel.org/git/5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br/#t and the dates on the thread; also Matheus and I had several conversations off-list trying to resolve the issues over that time [6] ...it finally kind of got unstuck after https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ [7] See for example https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/#t and quotes like "The core functionality of sparse-checkout has always been only partially implemented", a statement I still believe is true today. [8] https://lore.kernel.org/git/pull.809.git.git.1592356884310.gitgitgadget@gmail.com/ [9] See commit 66b209b86a ("merge-ort: implement CE_SKIP_WORKTREE handling with conflicted entries", 2021-03-20) [10] See commit ba359fd507 ("stash: fix stash application in sparse-checkouts", 2020-12-01) === SKIP_WORKTREE expectations in Git === A couple quotes: * From [11] (before the "sparse-checkout" command existed): If it needs too many special cases, hacks, and conditionals, then it is not worth the complexity---if it is easier to write a correct code by allowing Git to populate working tree files, it is perfectly fine to do so. In a sense, the sparse checkout "feature" itself is a hack by itself, and that is why I think this part should be "best effort" as well. * From the git-sparse-checkout manual (still present today): THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE. [11] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ === Suggested solution === SKIP_WORKTREE was written to allow sparse-checkouts, in particular, as the name of the option implies, to allow the file to NOT be in the worktree but consider it to be unchanged rather than deleted. The suggests a simple solution: present-despite-SKIP_WORKTREE files should not exist, for those using sparse-checkouts. Enforce this at index loading time by checking if core.sparseCheckout is true; if so, check files in the index with the SKIP_WORKTREE bit set to verify that they are absent from the working tree. If they are present, unset the bit (in memory, though any commands that write to the index will record the update). Users can, of course, can get the SKIP_WORKTREE bit back such as by running `git sparse-checkout reapply` (if they have ensured the file is unmodified and doesn't match the specified sparsity patterns). === Pros/Cons of suggested solution === Pros: * Solves the user visible problems reported above, which I've been complaining about for nearly a year but couldn't find a solution to. * Helps prevent slow performance degradation with a sparse-index. * Much easier behavior in sparse-checkouts for users to reason about * Very simple, ~30 lines of code. * Significantly simplifies some ugly testcases, and obviates the need to test an entire class of potential issues. * Reduces code complexity, reasoning, and maintenance. Avoids disagreements about weird corner cases[12]. * It has been reported that some users might be (ab)using SKIP_WORKTREE as a let-me-modify-but-keep-the-file-in-the-worktree mechanism[13, and a few other similar references]. These users know of multiple caveats and shortcomings in doing so; perhaps not surprising given the "SKIP_WORKTREE expecations" section above. However, these users use `git update-index --skip-worktree`, and not `git sparse-checkout` or core.sparseCheckout=true. As such, these users would be unaffected by this change and can continue abusing the system as before. [12] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [13] https://stackoverflow.com/questions/13630849/git-difference-between-assume-unchanged-and-skip-worktree Cons: * When core.sparseCheckout is enabled, this adds a performance cost to reading the index. I'll defer discussion of this cost to a subsequent patch, since I have some optimizations to add. === Notes on testcase modifications === The good: * t1011: Compare to two cases above it ('read-tree will not throw away dirty changes, non-sparse'); since the file is present, it should match the non-sparse case now * t1092: sparse-index & sparse-checkout now match full-worktree behavior in more cases! Yaay for consistency! * t6428, t7012: look at how much simpler the tests become! Merge and stash can just fail early telling the user there's a file in the way, instead of not noticing until it's about to write a file and then have to implement sudden crash avoidance. Hurray for sanity! * t7817: sparse behavior better matches full tree behavior. Hurray for sanity! The confusing: * t3705: These changes were ONLY needed on Windows, but they don't hurt other platforms. Let's discuss each individually: * core.sparseCheckout should be false by default. Nothing in this testcase toggles that until many, many tests later. However, early tests (#5 in particular) were testing `update-index --skip-worktree` behavior in a non-sparse-checkout, but the Windows tests in CI were behaving as if core.sparseCheckout=true had been specified somewhere. I do not have access to a Windows machine. But I just manually did what should have been a no-op and turned the config off. And it fixed the test. * I have no idea why the leftover .gitattributes file from this test was causing failures for test #18 on Windows, but only with these changes of mine. Test #18 was checking for empty stderr, and specifically wanted to know that some error completely unrelated to file endings did not appear. The leftover .gitattributes file thus caused some spurious stderr unrelated to the thing being checked. Since other tests did not intend to test normalization, just proactively remove the .gitattributes file. I'm certain this is cleaner and better, I'm just unsure why/how this didn't trigger problems before. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-14 15:59:41 +00:00
test_all_match git add --refresh folder1/a &&
test_must_be_empty sparse-checkout-err &&
test_sparse_unstaged folder1/a &&
test_sparse_match test_must_fail git add folder1/new &&
grep "Disable or modify the sparsity rules" sparse-checkout-err &&
test_sparse_unstaged folder1/new &&
test_sparse_match git add --sparse folder1/a &&
test_sparse_match git add --sparse folder1/new &&
add: allow operating on a sparse-only index Disable command_requires_full_index for 'git add'. This does not require any additional removals of ensure_full_index(). The main reason is that 'git add' discovers changes based on the pathspec and the worktree itself. These are then inserted into the index directly, and calls to index_name_pos() or index_file_exists() already call expand_to_path() at the appropriate time to support a sparse-index. Add a test to check that 'git add -A' and 'git add <file>' does not expand the index at all, as long as <file> is not within a sparse directory. This does not help the global 'git add .' case. We can measure the improvement using p2000-sparse-operations.sh with these results: Test HEAD~1 HEAD ------------------------------------------------------------------------------ 2000.6: git add -A (full-index-v3) 0.35(0.30+0.05) 0.37(0.29+0.06) +5.7% 2000.7: git add -A (full-index-v4) 0.31(0.26+0.06) 0.33(0.27+0.06) +6.5% 2000.8: git add -A (sparse-index-v3) 0.57(0.53+0.07) 0.05(0.04+0.08) -91.2% 2000.9: git add -A (sparse-index-v4) 0.58(0.55+0.06) 0.05(0.05+0.06) -91.4% While the 91% improvement seems impressive, it's important to recognize that previously we had significant overhead for expanding the sparse-index. Comparing to the full index case, 'git add -A' goes from 0.37s to 0.05s, which is "only" an 86% improvement. This modification to 'git add' creates some behavior change depending on the use of a sparse index. We modify a test in t1092 to demonstrate these changes which will be remedied in future changes. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-29 14:52:04 +00:00
test_all_match git add --sparse . &&
test_all_match git status --porcelain=v2 &&
test_all_match git commit -m folder1/new &&
add: allow operating on a sparse-only index Disable command_requires_full_index for 'git add'. This does not require any additional removals of ensure_full_index(). The main reason is that 'git add' discovers changes based on the pathspec and the worktree itself. These are then inserted into the index directly, and calls to index_name_pos() or index_file_exists() already call expand_to_path() at the appropriate time to support a sparse-index. Add a test to check that 'git add -A' and 'git add <file>' does not expand the index at all, as long as <file> is not within a sparse directory. This does not help the global 'git add .' case. We can measure the improvement using p2000-sparse-operations.sh with these results: Test HEAD~1 HEAD ------------------------------------------------------------------------------ 2000.6: git add -A (full-index-v3) 0.35(0.30+0.05) 0.37(0.29+0.06) +5.7% 2000.7: git add -A (full-index-v4) 0.31(0.26+0.06) 0.33(0.27+0.06) +6.5% 2000.8: git add -A (sparse-index-v3) 0.57(0.53+0.07) 0.05(0.04+0.08) -91.2% 2000.9: git add -A (sparse-index-v4) 0.58(0.55+0.06) 0.05(0.05+0.06) -91.4% While the 91% improvement seems impressive, it's important to recognize that previously we had significant overhead for expanding the sparse-index. Comparing to the full index case, 'git add -A' goes from 0.37s to 0.05s, which is "only" an 86% improvement. This modification to 'git add' creates some behavior change depending on the use of a sparse index. We modify a test in t1092 to demonstrate these changes which will be remedied in future changes. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-29 14:52:04 +00:00
test_all_match git rev-parse HEAD^{tree} &&
run_on_all ../edit-contents folder1/newer &&
test_all_match git add --sparse folder1/ &&
test_all_match git status --porcelain=v2 &&
add: allow operating on a sparse-only index Disable command_requires_full_index for 'git add'. This does not require any additional removals of ensure_full_index(). The main reason is that 'git add' discovers changes based on the pathspec and the worktree itself. These are then inserted into the index directly, and calls to index_name_pos() or index_file_exists() already call expand_to_path() at the appropriate time to support a sparse-index. Add a test to check that 'git add -A' and 'git add <file>' does not expand the index at all, as long as <file> is not within a sparse directory. This does not help the global 'git add .' case. We can measure the improvement using p2000-sparse-operations.sh with these results: Test HEAD~1 HEAD ------------------------------------------------------------------------------ 2000.6: git add -A (full-index-v3) 0.35(0.30+0.05) 0.37(0.29+0.06) +5.7% 2000.7: git add -A (full-index-v4) 0.31(0.26+0.06) 0.33(0.27+0.06) +6.5% 2000.8: git add -A (sparse-index-v3) 0.57(0.53+0.07) 0.05(0.04+0.08) -91.2% 2000.9: git add -A (sparse-index-v4) 0.58(0.55+0.06) 0.05(0.05+0.06) -91.4% While the 91% improvement seems impressive, it's important to recognize that previously we had significant overhead for expanding the sparse-index. Comparing to the full index case, 'git add -A' goes from 0.37s to 0.05s, which is "only" an 86% improvement. This modification to 'git add' creates some behavior change depending on the use of a sparse index. We modify a test in t1092 to demonstrate these changes which will be remedied in future changes. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-29 14:52:04 +00:00
test_all_match git commit -m folder1/newer &&
test_all_match git rev-parse HEAD^{tree}
'
test_expect_success 'checkout and reset --hard' '
init_repos &&
test_all_match git checkout update-folder1 &&
test_all_match git status --porcelain=v2 &&
test_all_match git checkout update-deep &&
test_all_match git status --porcelain=v2 &&
test_all_match git checkout -b reset-test &&
test_all_match git reset --hard deepest &&
test_all_match git reset --hard update-folder1 &&
test_all_match git reset --hard update-folder2
'
test_expect_success 'diff --cached' '
init_repos &&
write_script edit-contents <<-\EOF &&
echo text >>README.md
EOF
run_on_all ../edit-contents &&
test_all_match git diff &&
test_all_match git diff --cached &&
test_all_match git add README.md &&
test_all_match git diff &&
test_all_match git diff --cached
'
t1092: document bad 'git checkout' behavior Add new branches to the test repo that demonstrate directory/file conflicts in different ways. Since the directory 'folder1/' has adjacent files 'folder1-', 'folder1.txt', and 'folder10' it causes searches for 'folder1/' to land in a different place in the index than a search for 'folder1'. This causes a change in behavior when working with the df-conflict-1 and df-conflict-2 branches, whose only difference is that the first uses 'folder1' as the conflict and the other uses 'folder2' which does not have these adjacent files. We can extend two tests that compare the behavior across different 'git checkout' commands, and we see already that the behavior will be different in some cases and not in others. The difference between the two test loops is that one uses 'git reset --hard' between iterations. Further, we isolate the behavior of creating a staged change within a directory and then checking out a branch where that directory is replaced with a file. A full checkout behaves differently across these two cases, while a sparse-checkout cone behaves consistently. In both cases, the behavior is wrong. In one case, the staged change is dropped entirely. The other case the staged change is kept, replacing the file at that location, but none of the other files in the directory are kept. Likely, the correct behavior in this case is to reject the checkout and report the conflict, leaving HEAD in its previous location. None of the cases behave this way currently. Use comments to demonstrate that the tested behavior is only a documentation of the current, incorrect behavior to ensure we do not _accidentally_ change it. Instead, we would prefer to change it on purpose with a future change. At this point, the sparse-index does not handle these 'git checkout' commands correctly. Or rather, it _does_ reject the 'git checkout' when we have the staged change, but for the wrong reason. It also rejects the 'git checkout' commands when there is no staged change and we want to replace a directory with a file. A fix for that unstaged case will follow in the next change, but that will make the sparse-index agree with the full checkout case in these documented incorrect behaviors. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 20:14:40 +00:00
# NEEDSWORK: sparse-checkout behaves differently from full-checkout when
# running this test with 'df-conflict-2' after 'df-conflict-1'.
test_expect_success 'diff with renames and conflicts' '
init_repos &&
t1092: document bad 'git checkout' behavior Add new branches to the test repo that demonstrate directory/file conflicts in different ways. Since the directory 'folder1/' has adjacent files 'folder1-', 'folder1.txt', and 'folder10' it causes searches for 'folder1/' to land in a different place in the index than a search for 'folder1'. This causes a change in behavior when working with the df-conflict-1 and df-conflict-2 branches, whose only difference is that the first uses 'folder1' as the conflict and the other uses 'folder2' which does not have these adjacent files. We can extend two tests that compare the behavior across different 'git checkout' commands, and we see already that the behavior will be different in some cases and not in others. The difference between the two test loops is that one uses 'git reset --hard' between iterations. Further, we isolate the behavior of creating a staged change within a directory and then checking out a branch where that directory is replaced with a file. A full checkout behaves differently across these two cases, while a sparse-checkout cone behaves consistently. In both cases, the behavior is wrong. In one case, the staged change is dropped entirely. The other case the staged change is kept, replacing the file at that location, but none of the other files in the directory are kept. Likely, the correct behavior in this case is to reject the checkout and report the conflict, leaving HEAD in its previous location. None of the cases behave this way currently. Use comments to demonstrate that the tested behavior is only a documentation of the current, incorrect behavior to ensure we do not _accidentally_ change it. Instead, we would prefer to change it on purpose with a future change. At this point, the sparse-index does not handle these 'git checkout' commands correctly. Or rather, it _does_ reject the 'git checkout' when we have the staged change, but for the wrong reason. It also rejects the 'git checkout' commands when there is no staged change and we want to replace a directory with a file. A fix for that unstaged case will follow in the next change, but that will make the sparse-index agree with the full checkout case in these documented incorrect behaviors. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 20:14:40 +00:00
for branch in rename-out-to-out \
rename-out-to-in \
rename-in-to-out \
df-conflict-1 \
fd-conflict
do
test_all_match git checkout rename-base &&
test_all_match git checkout $branch -- . &&
test_all_match git status --porcelain=v2 &&
test_all_match git diff --cached --no-renames &&
test_all_match git diff --cached --find-renames || return 1
done
'
test_expect_success 'diff with directory/file conflicts' '
init_repos &&
t1092: document bad 'git checkout' behavior Add new branches to the test repo that demonstrate directory/file conflicts in different ways. Since the directory 'folder1/' has adjacent files 'folder1-', 'folder1.txt', and 'folder10' it causes searches for 'folder1/' to land in a different place in the index than a search for 'folder1'. This causes a change in behavior when working with the df-conflict-1 and df-conflict-2 branches, whose only difference is that the first uses 'folder1' as the conflict and the other uses 'folder2' which does not have these adjacent files. We can extend two tests that compare the behavior across different 'git checkout' commands, and we see already that the behavior will be different in some cases and not in others. The difference between the two test loops is that one uses 'git reset --hard' between iterations. Further, we isolate the behavior of creating a staged change within a directory and then checking out a branch where that directory is replaced with a file. A full checkout behaves differently across these two cases, while a sparse-checkout cone behaves consistently. In both cases, the behavior is wrong. In one case, the staged change is dropped entirely. The other case the staged change is kept, replacing the file at that location, but none of the other files in the directory are kept. Likely, the correct behavior in this case is to reject the checkout and report the conflict, leaving HEAD in its previous location. None of the cases behave this way currently. Use comments to demonstrate that the tested behavior is only a documentation of the current, incorrect behavior to ensure we do not _accidentally_ change it. Instead, we would prefer to change it on purpose with a future change. At this point, the sparse-index does not handle these 'git checkout' commands correctly. Or rather, it _does_ reject the 'git checkout' when we have the staged change, but for the wrong reason. It also rejects the 'git checkout' commands when there is no staged change and we want to replace a directory with a file. A fix for that unstaged case will follow in the next change, but that will make the sparse-index agree with the full checkout case in these documented incorrect behaviors. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 20:14:40 +00:00
for branch in rename-out-to-out \
rename-out-to-in \
rename-in-to-out \
unpack-trees: resolve sparse-directory/file conflicts When running unpack_trees() with a sparse index, we attempt to operate on the index without expanding the sparse directory entries. Thus, we operate by manipulating entire directories and passing them to the unpack function. In the case of the 'git checkout' command, this is the twoway_merge() function. There are several cases in twoway_merge() that handle different situations. One new one to add is the case of a directory/file conflict where the directory is sparse. Before the sparse index, such a conflict would appear as a list of file additions and deletions. Now, twoway_merge() initializes 'current', 'oldtree', and 'newtree' from src[0], src[1], and src[2], then sets 'oldtree' to NULL because it is equal to the df_conflict_entry. The way to determine that we have a directory/file conflict is to test that 'current' and 'newtree' disagree on being sparse directory entries. When we are in this case, we want to resolve the situation by calling merged_entry(). This allows replacing the 'current' entry with the 'newtree' entry. This is important for cases where we want to run 'git checkout' across the conflict and have the new HEAD represent the new file type at that path. The first NEEDSWORK comment dropped in t1092 demonstrates this necessary behavior. However, we still are in a confusing state when 'current' corresponds to a staged change within a sparse directory that is not present at HEAD. This should be atypical, because it requires adding a change outside of the sparse-checkout cone, but it is possible. Since we are unable to determine that this is a staged change within twoway_merge(), we cannot add a case to reject the merge at this point. I believe this is due to the use of df_conflict_entry in the place of 'oldtree' instead of using the valud at HEAD, which would provide some perspective to this decision. Any change that would allow this differentiation for staged entries would need to involve information further up in unpack_trees(). That work should be done, sometime, because we are further confusing the behavior of a directory/file conflict when staging a change in the directory. The two cases 'checkout behaves oddly with df-conflict-?' in t1092 demonstrate that even without a sparse-checkout, Git is not consistent in its behavior. Neither of the two options seems correct, either. This change makes the sparse-index behave differently than the typcial sparse-checkout case, but it does match the full checkout behavior in the df-conflict-2 case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 20:14:41 +00:00
df-conflict-1 \
df-conflict-2 \
t1092: document bad 'git checkout' behavior Add new branches to the test repo that demonstrate directory/file conflicts in different ways. Since the directory 'folder1/' has adjacent files 'folder1-', 'folder1.txt', and 'folder10' it causes searches for 'folder1/' to land in a different place in the index than a search for 'folder1'. This causes a change in behavior when working with the df-conflict-1 and df-conflict-2 branches, whose only difference is that the first uses 'folder1' as the conflict and the other uses 'folder2' which does not have these adjacent files. We can extend two tests that compare the behavior across different 'git checkout' commands, and we see already that the behavior will be different in some cases and not in others. The difference between the two test loops is that one uses 'git reset --hard' between iterations. Further, we isolate the behavior of creating a staged change within a directory and then checking out a branch where that directory is replaced with a file. A full checkout behaves differently across these two cases, while a sparse-checkout cone behaves consistently. In both cases, the behavior is wrong. In one case, the staged change is dropped entirely. The other case the staged change is kept, replacing the file at that location, but none of the other files in the directory are kept. Likely, the correct behavior in this case is to reject the checkout and report the conflict, leaving HEAD in its previous location. None of the cases behave this way currently. Use comments to demonstrate that the tested behavior is only a documentation of the current, incorrect behavior to ensure we do not _accidentally_ change it. Instead, we would prefer to change it on purpose with a future change. At this point, the sparse-index does not handle these 'git checkout' commands correctly. Or rather, it _does_ reject the 'git checkout' when we have the staged change, but for the wrong reason. It also rejects the 'git checkout' commands when there is no staged change and we want to replace a directory with a file. A fix for that unstaged case will follow in the next change, but that will make the sparse-index agree with the full checkout case in these documented incorrect behaviors. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 20:14:40 +00:00
fd-conflict
do
git -C full-checkout reset --hard &&
test_sparse_match git reset --hard &&
test_all_match git checkout $branch &&
test_all_match git checkout rename-base -- . &&
test_all_match git status --porcelain=v2 &&
test_all_match git diff --cached --no-renames &&
test_all_match git diff --cached --find-renames || return 1
done
'
test_expect_success 'log with pathspec outside sparse definition' '
init_repos &&
test_all_match git log -- a &&
test_all_match git log -- folder1/a &&
test_all_match git log -- folder2/a &&
test_all_match git log -- deep/a &&
test_all_match git log -- deep/deeper1/a &&
test_all_match git log -- deep/deeper1/deepest/a &&
test_all_match git checkout update-folder1 &&
test_all_match git log -- folder1/a
'
test_expect_success 'blame with pathspec inside sparse definition' '
init_repos &&
blame: enable and test the sparse index Enable the sparse index for the 'git blame' command. The index was already not expanded with this command, so the most interesting thing to do is to add tests that verify that 'git blame' behaves correctly when the sparse index is enabled and that its performance improves. More specifically, these cases are: 1. The index is not expanded for 'blame' when given paths in the sparse checkout cone at multiple levels. 2. Performance measurably improves for 'blame' with sparse index when given paths in the sparse checkout cone at multiple levels. The `p2000` tests demonstrate a ~60% execution time reduction when running 'blame' for a file two levels deep and and a ~30% execution time reduction for a file three levels deep. Test before after ---------------------------------------------------------------- 2000.62: git blame f2/f4/a (full-v3) 0.31 0.32 +3.2% 2000.63: git blame f2/f4/a (full-v4) 0.29 0.31 +6.9% 2000.64: git blame f2/f4/a (sparse-v3) 0.55 0.23 -58.2% 2000.65: git blame f2/f4/a (sparse-v4) 0.57 0.23 -59.6% 2000.66: git blame f2/f4/f3/a (full-v3) 0.77 0.85 +10.4% 2000.67: git blame f2/f4/f3/a (full-v4) 0.78 0.81 +3.8% 2000.68: git blame f2/f4/f3/a (sparse-v3) 1.07 0.72 -32.7% 2000.99: git blame f2/f4/f3/a (sparse-v4) 1.05 0.73 -30.5% We do not include paths outside the sparse checkout cone because blame does not support blaming files that are not present in the working directory. This is true in both sparse and full checkouts. Signed-off-by: Lessley Dennington <lessleydennington@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-06 15:56:01 +00:00
for file in a \
deep/a \
deep/deeper1/a \
deep/deeper1/deepest/a
do
test_all_match git blame $file || return 1
blame: enable and test the sparse index Enable the sparse index for the 'git blame' command. The index was already not expanded with this command, so the most interesting thing to do is to add tests that verify that 'git blame' behaves correctly when the sparse index is enabled and that its performance improves. More specifically, these cases are: 1. The index is not expanded for 'blame' when given paths in the sparse checkout cone at multiple levels. 2. Performance measurably improves for 'blame' with sparse index when given paths in the sparse checkout cone at multiple levels. The `p2000` tests demonstrate a ~60% execution time reduction when running 'blame' for a file two levels deep and and a ~30% execution time reduction for a file three levels deep. Test before after ---------------------------------------------------------------- 2000.62: git blame f2/f4/a (full-v3) 0.31 0.32 +3.2% 2000.63: git blame f2/f4/a (full-v4) 0.29 0.31 +6.9% 2000.64: git blame f2/f4/a (sparse-v3) 0.55 0.23 -58.2% 2000.65: git blame f2/f4/a (sparse-v4) 0.57 0.23 -59.6% 2000.66: git blame f2/f4/f3/a (full-v3) 0.77 0.85 +10.4% 2000.67: git blame f2/f4/f3/a (full-v4) 0.78 0.81 +3.8% 2000.68: git blame f2/f4/f3/a (sparse-v3) 1.07 0.72 -32.7% 2000.99: git blame f2/f4/f3/a (sparse-v4) 1.05 0.73 -30.5% We do not include paths outside the sparse checkout cone because blame does not support blaming files that are not present in the working directory. This is true in both sparse and full checkouts. Signed-off-by: Lessley Dennington <lessleydennington@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-06 15:56:01 +00:00
done
'
blame: enable and test the sparse index Enable the sparse index for the 'git blame' command. The index was already not expanded with this command, so the most interesting thing to do is to add tests that verify that 'git blame' behaves correctly when the sparse index is enabled and that its performance improves. More specifically, these cases are: 1. The index is not expanded for 'blame' when given paths in the sparse checkout cone at multiple levels. 2. Performance measurably improves for 'blame' with sparse index when given paths in the sparse checkout cone at multiple levels. The `p2000` tests demonstrate a ~60% execution time reduction when running 'blame' for a file two levels deep and and a ~30% execution time reduction for a file three levels deep. Test before after ---------------------------------------------------------------- 2000.62: git blame f2/f4/a (full-v3) 0.31 0.32 +3.2% 2000.63: git blame f2/f4/a (full-v4) 0.29 0.31 +6.9% 2000.64: git blame f2/f4/a (sparse-v3) 0.55 0.23 -58.2% 2000.65: git blame f2/f4/a (sparse-v4) 0.57 0.23 -59.6% 2000.66: git blame f2/f4/f3/a (full-v3) 0.77 0.85 +10.4% 2000.67: git blame f2/f4/f3/a (full-v4) 0.78 0.81 +3.8% 2000.68: git blame f2/f4/f3/a (sparse-v3) 1.07 0.72 -32.7% 2000.99: git blame f2/f4/f3/a (sparse-v4) 1.05 0.73 -30.5% We do not include paths outside the sparse checkout cone because blame does not support blaming files that are not present in the working directory. This is true in both sparse and full checkouts. Signed-off-by: Lessley Dennington <lessleydennington@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-06 15:56:01 +00:00
# Without a revision specified, blame will error if passed any file that
# is not present in the working directory (even if the file is tracked).
# Here we just verify that this is also true with sparse checkouts.
test_expect_success 'blame with pathspec outside sparse definition' '
init_repos &&
blame: enable and test the sparse index Enable the sparse index for the 'git blame' command. The index was already not expanded with this command, so the most interesting thing to do is to add tests that verify that 'git blame' behaves correctly when the sparse index is enabled and that its performance improves. More specifically, these cases are: 1. The index is not expanded for 'blame' when given paths in the sparse checkout cone at multiple levels. 2. Performance measurably improves for 'blame' with sparse index when given paths in the sparse checkout cone at multiple levels. The `p2000` tests demonstrate a ~60% execution time reduction when running 'blame' for a file two levels deep and and a ~30% execution time reduction for a file three levels deep. Test before after ---------------------------------------------------------------- 2000.62: git blame f2/f4/a (full-v3) 0.31 0.32 +3.2% 2000.63: git blame f2/f4/a (full-v4) 0.29 0.31 +6.9% 2000.64: git blame f2/f4/a (sparse-v3) 0.55 0.23 -58.2% 2000.65: git blame f2/f4/a (sparse-v4) 0.57 0.23 -59.6% 2000.66: git blame f2/f4/f3/a (full-v3) 0.77 0.85 +10.4% 2000.67: git blame f2/f4/f3/a (full-v4) 0.78 0.81 +3.8% 2000.68: git blame f2/f4/f3/a (sparse-v3) 1.07 0.72 -32.7% 2000.99: git blame f2/f4/f3/a (sparse-v4) 1.05 0.73 -30.5% We do not include paths outside the sparse checkout cone because blame does not support blaming files that are not present in the working directory. This is true in both sparse and full checkouts. Signed-off-by: Lessley Dennington <lessleydennington@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-06 15:56:01 +00:00
test_sparse_match git sparse-checkout set &&
for file in \
blame: enable and test the sparse index Enable the sparse index for the 'git blame' command. The index was already not expanded with this command, so the most interesting thing to do is to add tests that verify that 'git blame' behaves correctly when the sparse index is enabled and that its performance improves. More specifically, these cases are: 1. The index is not expanded for 'blame' when given paths in the sparse checkout cone at multiple levels. 2. Performance measurably improves for 'blame' with sparse index when given paths in the sparse checkout cone at multiple levels. The `p2000` tests demonstrate a ~60% execution time reduction when running 'blame' for a file two levels deep and and a ~30% execution time reduction for a file three levels deep. Test before after ---------------------------------------------------------------- 2000.62: git blame f2/f4/a (full-v3) 0.31 0.32 +3.2% 2000.63: git blame f2/f4/a (full-v4) 0.29 0.31 +6.9% 2000.64: git blame f2/f4/a (sparse-v3) 0.55 0.23 -58.2% 2000.65: git blame f2/f4/a (sparse-v4) 0.57 0.23 -59.6% 2000.66: git blame f2/f4/f3/a (full-v3) 0.77 0.85 +10.4% 2000.67: git blame f2/f4/f3/a (full-v4) 0.78 0.81 +3.8% 2000.68: git blame f2/f4/f3/a (sparse-v3) 1.07 0.72 -32.7% 2000.99: git blame f2/f4/f3/a (sparse-v4) 1.05 0.73 -30.5% We do not include paths outside the sparse checkout cone because blame does not support blaming files that are not present in the working directory. This is true in both sparse and full checkouts. Signed-off-by: Lessley Dennington <lessleydennington@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-06 15:56:01 +00:00
deep/a \
deep/deeper1/a \
deep/deeper1/deepest/a
do
test_sparse_match test_must_fail git blame $file &&
cat >expect <<-EOF &&
fatal: Cannot lstat '"'"'$file'"'"': No such file or directory
EOF
# We compare sparse-checkout-err and sparse-index-err in
# `test_sparse_match`. Given we know they are the same, we
# only check the content of sparse-index-err here.
test_cmp expect sparse-index-err || return 1
blame: enable and test the sparse index Enable the sparse index for the 'git blame' command. The index was already not expanded with this command, so the most interesting thing to do is to add tests that verify that 'git blame' behaves correctly when the sparse index is enabled and that its performance improves. More specifically, these cases are: 1. The index is not expanded for 'blame' when given paths in the sparse checkout cone at multiple levels. 2. Performance measurably improves for 'blame' with sparse index when given paths in the sparse checkout cone at multiple levels. The `p2000` tests demonstrate a ~60% execution time reduction when running 'blame' for a file two levels deep and and a ~30% execution time reduction for a file three levels deep. Test before after ---------------------------------------------------------------- 2000.62: git blame f2/f4/a (full-v3) 0.31 0.32 +3.2% 2000.63: git blame f2/f4/a (full-v4) 0.29 0.31 +6.9% 2000.64: git blame f2/f4/a (sparse-v3) 0.55 0.23 -58.2% 2000.65: git blame f2/f4/a (sparse-v4) 0.57 0.23 -59.6% 2000.66: git blame f2/f4/f3/a (full-v3) 0.77 0.85 +10.4% 2000.67: git blame f2/f4/f3/a (full-v4) 0.78 0.81 +3.8% 2000.68: git blame f2/f4/f3/a (sparse-v3) 1.07 0.72 -32.7% 2000.99: git blame f2/f4/f3/a (sparse-v4) 1.05 0.73 -30.5% We do not include paths outside the sparse checkout cone because blame does not support blaming files that are not present in the working directory. This is true in both sparse and full checkouts. Signed-off-by: Lessley Dennington <lessleydennington@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-06 15:56:01 +00:00
done
'
reset: preserve skip-worktree bit in mixed reset Change `update_index_from_diff` to set `skip-worktree` when applicable for new index entries. When `git reset --mixed <tree-ish>` is run, entries in the index with differences between the pre-reset HEAD and reset <tree-ish> are identified and handled with `update_index_from_diff`. For each file, a new cache entry in inserted into the index, created from the <tree-ish> side of the reset (without changing the working tree). However, the newly-created entry must have `skip-worktree` explicitly set in either of the following scenarios: 1. the file is in the current index and has `skip-worktree` set 2. the file is not in the current index but is outside of a defined sparse checkout definition Not setting the `skip-worktree` bit leads to likely-undesirable results for a user. It causes `skip-worktree` settings to disappear on the "diff"-containing files (but *only* the diff-containing files), leading to those files now showing modifications in `git status`. For example, when running `git reset --mixed` in a sparse checkout, some file entries outside of sparse checkout could show up as deleted, despite the user never deleting anything (and not wanting them on-disk anyway). Additionally, add a test to `t7102` to ensure `skip-worktree` is preserved in a basic `git reset --mixed` scenario and update a failure-documenting test from 19a0acc (t1092: test interesting sparse-checkout scenarios, 2021-01-23) with new expected behavior. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-27 14:39:17 +00:00
test_expect_success 'checkout and reset (mixed)' '
init_repos &&
test_all_match git checkout -b reset-test update-deep &&
test_all_match git reset deepest &&
sparse-index: convert from full to sparse If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 13:10:55 +00:00
reset: preserve skip-worktree bit in mixed reset Change `update_index_from_diff` to set `skip-worktree` when applicable for new index entries. When `git reset --mixed <tree-ish>` is run, entries in the index with differences between the pre-reset HEAD and reset <tree-ish> are identified and handled with `update_index_from_diff`. For each file, a new cache entry in inserted into the index, created from the <tree-ish> side of the reset (without changing the working tree). However, the newly-created entry must have `skip-worktree` explicitly set in either of the following scenarios: 1. the file is in the current index and has `skip-worktree` set 2. the file is not in the current index but is outside of a defined sparse checkout definition Not setting the `skip-worktree` bit leads to likely-undesirable results for a user. It causes `skip-worktree` settings to disappear on the "diff"-containing files (but *only* the diff-containing files), leading to those files now showing modifications in `git status`. For example, when running `git reset --mixed` in a sparse checkout, some file entries outside of sparse checkout could show up as deleted, despite the user never deleting anything (and not wanting them on-disk anyway). Additionally, add a test to `t7102` to ensure `skip-worktree` is preserved in a basic `git reset --mixed` scenario and update a failure-documenting test from 19a0acc (t1092: test interesting sparse-checkout scenarios, 2021-01-23) with new expected behavior. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-27 14:39:17 +00:00
# Because skip-worktree is preserved, resetting to update-folder1
# will show worktree changes for folder1/a in full-checkout, but not
# in sparse-checkout or sparse-index.
git -C full-checkout reset update-folder1 >full-checkout-out &&
sparse-index: convert from full to sparse If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 13:10:55 +00:00
test_sparse_match git reset update-folder1 &&
reset: preserve skip-worktree bit in mixed reset Change `update_index_from_diff` to set `skip-worktree` when applicable for new index entries. When `git reset --mixed <tree-ish>` is run, entries in the index with differences between the pre-reset HEAD and reset <tree-ish> are identified and handled with `update_index_from_diff`. For each file, a new cache entry in inserted into the index, created from the <tree-ish> side of the reset (without changing the working tree). However, the newly-created entry must have `skip-worktree` explicitly set in either of the following scenarios: 1. the file is in the current index and has `skip-worktree` set 2. the file is not in the current index but is outside of a defined sparse checkout definition Not setting the `skip-worktree` bit leads to likely-undesirable results for a user. It causes `skip-worktree` settings to disappear on the "diff"-containing files (but *only* the diff-containing files), leading to those files now showing modifications in `git status`. For example, when running `git reset --mixed` in a sparse checkout, some file entries outside of sparse checkout could show up as deleted, despite the user never deleting anything (and not wanting them on-disk anyway). Additionally, add a test to `t7102` to ensure `skip-worktree` is preserved in a basic `git reset --mixed` scenario and update a failure-documenting test from 19a0acc (t1092: test interesting sparse-checkout scenarios, 2021-01-23) with new expected behavior. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-27 14:39:17 +00:00
grep "M folder1/a" full-checkout-out &&
! grep "M folder1/a" sparse-checkout-out &&
run_on_sparse test_path_is_missing folder1
'
test_expect_success 'checkout and reset (merge)' '
sparse-index: convert from full to sparse If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 13:10:55 +00:00
init_repos &&
write_script edit-contents <<-\EOF &&
echo text >>$1
EOF
test_all_match git checkout -b reset-test update-deep &&
run_on_all ../edit-contents a &&
test_all_match git reset --merge deepest &&
test_all_match git status --porcelain=v2 &&
test_all_match git reset --hard update-deep &&
run_on_all ../edit-contents deep/a &&
test_all_match test_must_fail git reset --merge deepest
'
test_expect_success 'checkout and reset (keep)' '
init_repos &&
write_script edit-contents <<-\EOF &&
echo text >>$1
EOF
test_all_match git checkout -b reset-test update-deep &&
run_on_all ../edit-contents a &&
test_all_match git reset --keep deepest &&
test_all_match git status --porcelain=v2 &&
test_all_match git reset --hard update-deep &&
run_on_all ../edit-contents deep/a &&
test_all_match test_must_fail git reset --keep deepest
'
unpack-trees: increment cache_bottom for sparse directories Correct tracking of the 'cache_bottom' for cases where sparse directories are present in the index. BACKGROUND ---------- The 'unpack_trees_options.cache_bottom' is a variable that tracks the in-progress "bottom" of the cache as 'unpack_trees()' iterates through the contents of the index. Most importantly, this value informs the sequential return values of 'next_cache_entry()' which, in the "diff cache" usage of 'unpack_callback()', are either unpacked as-is or are passed into the diff machinery. The 'cache_bottom' is intended to track the position of the first entry in the index that has not yet been diffed or unpacked. It is advanced in two main ways: either it is incremented when an index entry is marked as "used" (in 'mark_ce_used()'), indicating that it was unpacked or diffed, or when a directory is unpacked, in which case it is increased by an amount equaling the number of index entries inside that tree. In 17a1bb570b (unpack-trees: preserve cache_bottom, 2021-07-14), it was identified that sparse directories posed a problem to the above 'cache_bottom' advancement logic - because a sparse directory was both an index entry that could be "used" and a directory that can be unpacked, the 'cache_bottom' would be incremented too many times. To solve this problem, the 'mark_ce_used()' advancement of 'cache_bottom' was skipped for sparse directories. INCORRECT CACHE_BOTTOM TRACKING ------------------------------- Skipping the 'cache_bottom' advancement for sparse directories in 'mark_ce_used()' breaks down in two cases: 1. When the 'unpack_trees()' operation is *not* a "cache diff" (because the directory contents-based incrementing of 'cache_bottom' does not happen). 2. When a cache diff is performed with a pathspec (because 'unpack_index_entry()' will unpack a sparse directory not matched by the pathspec without performing the directory contents-based increment). The former luckily does not appear to affect 'git' behavior, likely because 'cache_bottom' is largely unused (non-"cache diff" 'unpack_trees()' uses 'find_index_entry()' - rather than 'next_cache_entry()' - to find the index entries to unpack). The latter, however, causes 'cache_bottom' to "lag behind" its intended position by an amount equal to the number of sparse directories unpacked so far with 'unpack_index_entry()'. If a repository is structured such that any sparse directories are ordered lexicographically *after* any pathspec-matching directories, though, this issue won't present any adverse behavior. This was the case with the 't1092-sparse-checkout-compatibility.sh' tests before the addition of the 'before/' sparse directory (ordered *before* the in-cone 'deep/' directory), therefore sidestepping the issue. Once the 'before/' directory was added, though, 'cache_bottom' began to lag behind its intended position, causing 'next_cache_entry()' to return index entries it had already processed and, ultimately, an incorrect diff. CORRECTING CACHE_BOTTOM ----------------------- The problems observed in 't1092' come from 'cache_bottom' lagging behind in cases where the cache tree-based advancement doesn't occur. To solve this, then, the fix in 17a1bb570b is "reversed"; rather than skipping 'cache_bottom' advancement in 'mark_ce_used()', we skip the directory contents-based advancement for sparse directories. Now, every index entry can be accounted for in 'cache_bottom': * if you're working with a single index entry, 'cache_bottom' is incremented in 'mark_ce_used()' * if you're working with a directory that contains index entries (but is not one itself), 'cache_bottom' is incremented by the number of entries in that directory. Finally, change the 'test_expect_failure' tests in 't1092' failing due to this bug back to 'test_expect_success'. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-17 15:55:35 +00:00
test_expect_success 'reset with pathspecs inside sparse definition' '
init_repos &&
write_script edit-contents <<-\EOF &&
echo text >>$1
EOF
test_all_match git checkout -b reset-test update-deep &&
run_on_all ../edit-contents deep/a &&
test_all_match git reset base -- deep/a &&
test_all_match git status --porcelain=v2 &&
test_all_match git reset base -- nonexistent-file &&
test_all_match git status --porcelain=v2 &&
test_all_match git reset deepest -- deep &&
test_all_match git status --porcelain=v2
'
# Although the working tree differs between full and sparse checkouts after
# reset, the state of the index is the same.
test_expect_success 'reset with pathspecs outside sparse definition' '
init_repos &&
test_all_match git checkout -b reset-test base &&
test_sparse_match git reset update-folder1 -- folder1 &&
git -C full-checkout reset update-folder1 -- folder1 &&
test_all_match git ls-files -s -- folder1 &&
test_sparse_match git reset update-folder2 -- folder2/a &&
git -C full-checkout reset update-folder2 -- folder2/a &&
test_all_match git ls-files -s -- folder2/a
'
test_expect_success 'reset with wildcard pathspec' '
init_repos &&
test_all_match git reset update-deep -- deep\* &&
test_all_match git ls-files -s -- deep &&
test_all_match git reset deepest -- deep\*\*\* &&
test_all_match git ls-files -s -- deep &&
# The following `git reset`s result in updating the index on files with
# `skip-worktree` enabled. To avoid failing due to discrepencies in reported
# "modified" files, `test_sparse_match` reset is performed separately from
# "full-checkout" reset, then the index contents of all repos are verified.
test_sparse_match git reset update-folder1 -- \*/a &&
git -C full-checkout reset update-folder1 -- \*/a &&
test_all_match git ls-files -s -- deep/a folder1/a &&
test_sparse_match git reset update-folder2 -- folder\* &&
git -C full-checkout reset update-folder2 -- folder\* &&
test_all_match git ls-files -s -- folder10 folder1 folder2 &&
test_sparse_match git reset base -- folder1/\* &&
git -C full-checkout reset base -- folder1/\* &&
test_all_match git ls-files -s -- folder1
sparse-index: convert from full to sparse If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 13:10:55 +00:00
'
test_expect_success 'reset hard with removed sparse dir' '
init_repos &&
run_on_all git rm -r --sparse folder1 &&
test_all_match git status --porcelain=v2 &&
test_all_match git reset --hard &&
test_all_match git status --porcelain=v2 &&
cat >expect <<-\EOF &&
folder1/
EOF
git -C sparse-index ls-files --sparse folder1 >out &&
test_cmp expect out
'
update-index: add tests for sparse-checkout compatibility Introduce tests for a variety of `git update-index` use cases, including performance scenarios. Tests are intended to exercise `update-index` with options that change the commands interaction with the index (e.g., `--again`) and with files/directories inside and outside a sparse checkout cone. Of note is that these tests clearly establish the behavior of `git update-index --add` with untracked, outside-of-cone files. Unlike `git add`, which fails with an error when provided with such files, `update-index` succeeds in adding them to the index. Additionally, the `skip-worktree` flag is *not* automatically added to the new entry. Although this is pre-existing behavior, there are a couple of reasons to avoid changing it in favor of consistency with e.g. `git add`: * `update-index` is low-level command for modifying the index; while it can perform operations similar to those of `add`, it traditionally has fewer "guardrails" preventing a user from doing something they may not want to do (in this case, adding an outside-of-cone, non-`skip-worktree` file to the index) * `update-index` typically only exits with an error code if it is incapable of performing an operation (e.g., if an internal function call fails); adding a new file outside the sparse checkout definition is still a valid operation, albeit an inadvisable one * `update-index` does not implicitly set flags (e.g., `skip-worktree`) when creating new index entries with `--add`; if flags need to be updated, options like `--[no-]skip-worktree` allow a user to intentionally set them All this to say that, while there are valid reasons to consider changing the treatment of outside-of-cone files in `update-index`, there are also sufficient reasons for leaving it as-is. Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Victoria Dye <vdye@github.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-11 18:05:04 +00:00
test_expect_success 'update-index modify outside sparse definition' '
init_repos &&
write_script edit-contents <<-\EOF &&
echo text >>$1
EOF
# Create & modify folder1/a
# Note that this setup is a manual way of reaching the erroneous
# condition in which a `skip-worktree` enabled, outside-of-cone file
# exists on disk. It is used here to ensure `update-index` is stable
# and behaves predictably if such a condition occurs.
run_on_sparse mkdir -p folder1 &&
run_on_sparse cp ../initial-repo/folder1/a folder1/a &&
run_on_all ../edit-contents folder1/a &&
repo_read_index: clear SKIP_WORKTREE bit from files present in worktree The fix is short (~30 lines), but the description is not. Sorry. There is a set of problems caused by files in what I'll refer to as the "present-despite-SKIP_WORKTREE" state. This commit aims to not just fix these problems, but remove the entire class as a possibility -- for those using sparse checkouts. But first, we need to understand the problems this class presents. A quick outline: * Problems * User facing issues * Problem space complexity * Maintenance and code correctness challenges * SKIP_WORKTREE expectations in Git * Suggested solution * Pros/Cons of suggested solution * Notes on testcase modifications === User facing issues === There are various ways for users to get files to be present in the working copy despite having the SKIP_WORKTREE bit set for that file in the index. This may come from: * various git commands not really supporting the SKIP_WORKTREE bit[1,2] * users grabbing files from elsewhere and writing them to the worktree (perhaps even cached in their editor) * users attempting to "abort" a sparse-checkout operation with a not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the working tree is not atomic)[3]. Once users have present-despite-SKIP_WORKTREE files, any modifications users make to these files will be ignored, possibly to users' confusion. Further: * these files will degrade performance for the sparse-index case due to requiring the index to be expanded (see commit 55dfcf9591 ("sparse-checkout: clear tracked sparse dirs", 2021-09-08) for why we try to delete entire directories outside the sparse cone). * these files will not be updated by by standard commands (switch/checkout/pull/merge/rebase will leave them alone unless conflicts happen -- and even then, the conflicted file may be written somewhere else to avoid overwriting the SKIP_WORKTREE file that is present and in the way) * there is nothing in Git that users can use to discover such files (status, diff, grep, etc. all ignore it) * there is no reasonable mechanism to "recover" from such a condition (neither `git sparse-checkout reapply` nor `git reset --hard` will correct it). So, not only are users modifications ignored, but the files get progressively more stale over time. At some point in the future, they may change their sparseness specification or disable sparse-checkouts. At that time, all present-despite-SKIP_WORKTREE files will show up as having lots of modifications because they represent a version from a different branch or commit. These might include user-made local changes from days before, but the only way to tell is to have users look through them all closely. If these users come to others for help, there will be no logs that explain the issue; it's just a mysterious list of changes. Users might adamantly claim (correctly, as it turns out) that they didn't modify these files, while others presume they did. [1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ [2] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/ === Problem space complexity === SKIP_WORKTREE has been part of Git for over a decade. Duy did lots of work on it initially, and several others have since come along and put lots of work into it. Stolee spent most of 2021 on the sparse-index, with lots of bugfixes along the way including to non-sparse-index cases as we are still trying to get sparse checkouts to behave reasonably. Basically every codepath throughout the treat needs to be aware of an additional type of file: tracked-but-not-present. The extra type results in lots of extra testcases and lots of extra code everywhere. But, the sad thing is that we actually have more than one extra type. We have tracked, tracked-but-not-present (SKIP_WORKTREE), and tracked-but-promised-to-not-be-present-but-is-present-anyway (present-despite-SKIP_WORKTREE). Two types is a monumental amount of effort to support, and adding a third feels a bit like insanity[4]. [4] Some examples of which can be seen at https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ === Maintenance and code correctness challenges === Matheus' patches to grep stalled for nearly a year, in part because of complications of how to handle sparse-checkouts appropriately in all cases[5][6] (with trying to sanely figure out how to sanely handle present-despite-SKIP_WORKTREE files being one of the complications). His rm/add follow-ups also took months because of those kinds of issues[7]. The corner cases with things like submodules and SKIP_WORKTREE with the addition of present-despite-SKIP_WORKTREE start becoming really complex[8]. We've had to add ugly logic to merge-ort to attempt to handle present-despite-SKIP_WORKTREE files[9], and basically just been forced to give up in merge-recursive knowing full well that we'll sometimes silently discard user modifications. Despite stash essentially being a merge, it needed extra code (beyond what was in merge-ort and merge-recursive) to manually tweak SKIP_WORKTREE bits in order to avoid a few different bugs that'd result in an early abort with a partial stash application[10]. [5] See https://lore.kernel.org/git/5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br/#t and the dates on the thread; also Matheus and I had several conversations off-list trying to resolve the issues over that time [6] ...it finally kind of got unstuck after https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ [7] See for example https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/#t and quotes like "The core functionality of sparse-checkout has always been only partially implemented", a statement I still believe is true today. [8] https://lore.kernel.org/git/pull.809.git.git.1592356884310.gitgitgadget@gmail.com/ [9] See commit 66b209b86a ("merge-ort: implement CE_SKIP_WORKTREE handling with conflicted entries", 2021-03-20) [10] See commit ba359fd507 ("stash: fix stash application in sparse-checkouts", 2020-12-01) === SKIP_WORKTREE expectations in Git === A couple quotes: * From [11] (before the "sparse-checkout" command existed): If it needs too many special cases, hacks, and conditionals, then it is not worth the complexity---if it is easier to write a correct code by allowing Git to populate working tree files, it is perfectly fine to do so. In a sense, the sparse checkout "feature" itself is a hack by itself, and that is why I think this part should be "best effort" as well. * From the git-sparse-checkout manual (still present today): THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE. [11] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ === Suggested solution === SKIP_WORKTREE was written to allow sparse-checkouts, in particular, as the name of the option implies, to allow the file to NOT be in the worktree but consider it to be unchanged rather than deleted. The suggests a simple solution: present-despite-SKIP_WORKTREE files should not exist, for those using sparse-checkouts. Enforce this at index loading time by checking if core.sparseCheckout is true; if so, check files in the index with the SKIP_WORKTREE bit set to verify that they are absent from the working tree. If they are present, unset the bit (in memory, though any commands that write to the index will record the update). Users can, of course, can get the SKIP_WORKTREE bit back such as by running `git sparse-checkout reapply` (if they have ensured the file is unmodified and doesn't match the specified sparsity patterns). === Pros/Cons of suggested solution === Pros: * Solves the user visible problems reported above, which I've been complaining about for nearly a year but couldn't find a solution to. * Helps prevent slow performance degradation with a sparse-index. * Much easier behavior in sparse-checkouts for users to reason about * Very simple, ~30 lines of code. * Significantly simplifies some ugly testcases, and obviates the need to test an entire class of potential issues. * Reduces code complexity, reasoning, and maintenance. Avoids disagreements about weird corner cases[12]. * It has been reported that some users might be (ab)using SKIP_WORKTREE as a let-me-modify-but-keep-the-file-in-the-worktree mechanism[13, and a few other similar references]. These users know of multiple caveats and shortcomings in doing so; perhaps not surprising given the "SKIP_WORKTREE expecations" section above. However, these users use `git update-index --skip-worktree`, and not `git sparse-checkout` or core.sparseCheckout=true. As such, these users would be unaffected by this change and can continue abusing the system as before. [12] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [13] https://stackoverflow.com/questions/13630849/git-difference-between-assume-unchanged-and-skip-worktree Cons: * When core.sparseCheckout is enabled, this adds a performance cost to reading the index. I'll defer discussion of this cost to a subsequent patch, since I have some optimizations to add. === Notes on testcase modifications === The good: * t1011: Compare to two cases above it ('read-tree will not throw away dirty changes, non-sparse'); since the file is present, it should match the non-sparse case now * t1092: sparse-index & sparse-checkout now match full-worktree behavior in more cases! Yaay for consistency! * t6428, t7012: look at how much simpler the tests become! Merge and stash can just fail early telling the user there's a file in the way, instead of not noticing until it's about to write a file and then have to implement sudden crash avoidance. Hurray for sanity! * t7817: sparse behavior better matches full tree behavior. Hurray for sanity! The confusing: * t3705: These changes were ONLY needed on Windows, but they don't hurt other platforms. Let's discuss each individually: * core.sparseCheckout should be false by default. Nothing in this testcase toggles that until many, many tests later. However, early tests (#5 in particular) were testing `update-index --skip-worktree` behavior in a non-sparse-checkout, but the Windows tests in CI were behaving as if core.sparseCheckout=true had been specified somewhere. I do not have access to a Windows machine. But I just manually did what should have been a no-op and turned the config off. And it fixed the test. * I have no idea why the leftover .gitattributes file from this test was causing failures for test #18 on Windows, but only with these changes of mine. Test #18 was checking for empty stderr, and specifically wanted to know that some error completely unrelated to file endings did not appear. The leftover .gitattributes file thus caused some spurious stderr unrelated to the thing being checked. Since other tests did not intend to test normalization, just proactively remove the .gitattributes file. I'm certain this is cleaner and better, I'm just unsure why/how this didn't trigger problems before. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-14 15:59:41 +00:00
# If file has skip-worktree enabled, but the file is present, it is
# treated the same as if skip-worktree is disabled
test_all_match git status --porcelain=v2 &&
test_all_match git update-index folder1/a &&
test_all_match git status --porcelain=v2 &&
update-index: add tests for sparse-checkout compatibility Introduce tests for a variety of `git update-index` use cases, including performance scenarios. Tests are intended to exercise `update-index` with options that change the commands interaction with the index (e.g., `--again`) and with files/directories inside and outside a sparse checkout cone. Of note is that these tests clearly establish the behavior of `git update-index --add` with untracked, outside-of-cone files. Unlike `git add`, which fails with an error when provided with such files, `update-index` succeeds in adding them to the index. Additionally, the `skip-worktree` flag is *not* automatically added to the new entry. Although this is pre-existing behavior, there are a couple of reasons to avoid changing it in favor of consistency with e.g. `git add`: * `update-index` is low-level command for modifying the index; while it can perform operations similar to those of `add`, it traditionally has fewer "guardrails" preventing a user from doing something they may not want to do (in this case, adding an outside-of-cone, non-`skip-worktree` file to the index) * `update-index` typically only exits with an error code if it is incapable of performing an operation (e.g., if an internal function call fails); adding a new file outside the sparse checkout definition is still a valid operation, albeit an inadvisable one * `update-index` does not implicitly set flags (e.g., `skip-worktree`) when creating new index entries with `--add`; if flags need to be updated, options like `--[no-]skip-worktree` allow a user to intentionally set them All this to say that, while there are valid reasons to consider changing the treatment of outside-of-cone files in `update-index`, there are also sufficient reasons for leaving it as-is. Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Victoria Dye <vdye@github.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-11 18:05:04 +00:00
# When skip-worktree is disabled (even on files outside sparse cone), file
# is updated in the index
test_sparse_match git update-index --no-skip-worktree folder1/a &&
test_all_match git status --porcelain=v2 &&
test_all_match git update-index folder1/a &&
test_all_match git status --porcelain=v2
'
test_expect_success 'update-index --add outside sparse definition' '
init_repos &&
write_script edit-contents <<-\EOF &&
echo text >>$1
EOF
# Create folder1, add new file
run_on_sparse mkdir -p folder1 &&
run_on_all ../edit-contents folder1/b &&
# The *untracked* out-of-cone file is added to the index because it does
# not have a `skip-worktree` bit to signal that it should be ignored
# (unlike in `git add`, which will fail due to the file being outside
# the sparse checkout definition).
test_all_match git update-index --add folder1/b &&
test_all_match git status --porcelain=v2
'
# NEEDSWORK: `--remove`, unlike the rest of `update-index`, does not ignore
# `skip-worktree` entries by default and will remove them from the index.
# The `--ignore-skip-worktree-entries` flag must be used in conjunction with
# `--remove` to ignore the `skip-worktree` entries and prevent their removal
# from the index.
test_expect_success 'update-index --remove outside sparse definition' '
init_repos &&
# When --ignore-skip-worktree-entries is _not_ specified:
# out-of-cone, not-on-disk files are removed from the index
test_sparse_match git update-index --remove folder1/a &&
cat >expect <<-EOF &&
D folder1/a
EOF
test_sparse_match git diff --cached --name-status &&
test_cmp expect sparse-checkout-out &&
# Reset the state
test_all_match git reset --hard &&
# When --ignore-skip-worktree-entries is specified, out-of-cone
# (skip-worktree) files are ignored
test_sparse_match git update-index --remove --ignore-skip-worktree-entries folder1/a &&
test_sparse_match git diff --cached --name-status &&
test_must_be_empty sparse-checkout-out &&
# Reset the state
test_all_match git reset --hard &&
# --force-remove supercedes --ignore-skip-worktree-entries, removing
# a skip-worktree file from the index (and disk) when both are specified
# with --remove
test_sparse_match git update-index --force-remove --ignore-skip-worktree-entries folder1/a &&
cat >expect <<-EOF &&
D folder1/a
EOF
test_sparse_match git diff --cached --name-status &&
test_cmp expect sparse-checkout-out
'
test_expect_success 'update-index with directories' '
init_repos &&
# update-index will exit silently when provided with a directory name
# containing a trailing slash
test_all_match git update-index deep/ folder1/ &&
grep "Ignoring path deep/" sparse-checkout-err &&
grep "Ignoring path folder1/" sparse-checkout-err &&
# When update-index is given a directory name WITHOUT a trailing slash, it will
# behave in different ways depending on the status of the directory on disk:
# * if it exists, the command exits with an error ("add individual files instead")
# * if it does NOT exist (e.g., in a sparse-checkout), it is assumed to be a
# file and either triggers an error ("does not exist and --remove not passed")
# or is ignored completely (when using --remove)
test_all_match test_must_fail git update-index deep &&
run_on_all test_must_fail git update-index folder1 &&
test_must_fail git -C full-checkout update-index --remove folder1 &&
test_sparse_match git update-index --remove folder1 &&
test_all_match git status --porcelain=v2
'
test_expect_success 'update-index --again file outside sparse definition' '
init_repos &&
test_all_match git checkout -b test-reupdate &&
# Update HEAD without modifying the index to introduce a difference in
# folder1/a
test_sparse_match git reset --soft update-folder1 &&
# Because folder1/a differs in the index vs HEAD,
# `git update-index --no-skip-worktree --again` will effectively perform
# `git update-index --no-skip-worktree folder1/a` and remove the skip-worktree
# flag from folder1/a
test_sparse_match git update-index --no-skip-worktree --again &&
test_sparse_match git status --porcelain=v2 &&
cat >expect <<-EOF &&
D folder1/a
EOF
test_sparse_match git diff --name-status &&
test_cmp expect sparse-checkout-out
'
test_expect_success 'update-index --cacheinfo' '
init_repos &&
deep_a_oid=$(git -C full-checkout rev-parse update-deep:deep/a) &&
folder2_oid=$(git -C full-checkout rev-parse update-folder2:folder2) &&
folder1_a_oid=$(git -C full-checkout rev-parse update-folder1:folder1/a) &&
test_all_match git update-index --cacheinfo 100644 $deep_a_oid deep/a &&
test_all_match git status --porcelain=v2 &&
# Cannot add sparse directory, even in sparse index case
test_all_match test_must_fail git update-index --add --cacheinfo 040000 $folder2_oid folder2/ &&
# Sparse match only: the new outside-of-cone entry is added *without* skip-worktree,
# so `git status` reports it as "deleted" in the worktree
test_sparse_match git update-index --add --cacheinfo 100644 $folder1_a_oid folder1/a &&
test_sparse_match git status --porcelain=v2 &&
cat >expect <<-EOF &&
MD folder1/a
EOF
test_sparse_match git status --short -- folder1/a &&
test_cmp expect sparse-checkout-out &&
# To return folder1/a to "normal" for a sparse checkout (ignored &
# outside-of-cone), add the skip-worktree flag.
test_sparse_match git update-index --skip-worktree folder1/a &&
cat >expect <<-EOF &&
S folder1/a
EOF
test_sparse_match git ls-files -t -- folder1/a &&
test_cmp expect sparse-checkout-out
'
for MERGE_TREES in "base HEAD update-folder2" \
"update-folder1 update-folder2" \
"update-folder2"
do
test_expect_success "'read-tree -mu $MERGE_TREES' with files outside sparse definition" '
init_repos &&
# Although the index matches, without --no-sparse-checkout, outside-of-
# definition files will not exist on disk for sparse checkouts
test_all_match git read-tree -mu $MERGE_TREES &&
test_all_match git status --porcelain=v2 &&
test_path_is_missing sparse-checkout/folder2 &&
test_path_is_missing sparse-index/folder2 &&
test_all_match git read-tree --reset -u HEAD &&
test_all_match git status --porcelain=v2 &&
test_all_match git read-tree -mu --no-sparse-checkout $MERGE_TREES &&
test_all_match git status --porcelain=v2 &&
test_cmp sparse-checkout/folder2/a sparse-index/folder2/a &&
test_cmp sparse-checkout/folder2/a full-checkout/folder2/a
'
done
test_expect_success 'read-tree --merge with edit/edit conflicts in sparse directories' '
init_repos &&
# Merge of multiple changes to same directory (but not same files) should
# succeed
test_all_match git read-tree -mu base rename-base update-folder1 &&
test_all_match git status --porcelain=v2 &&
test_all_match git reset --hard &&
test_all_match git read-tree -mu rename-base update-folder2 &&
test_all_match git status --porcelain=v2 &&
test_all_match git reset --hard &&
test_all_match test_must_fail git read-tree -mu base update-folder1 rename-out-to-in &&
test_all_match test_must_fail git read-tree -mu rename-out-to-in update-folder1
'
test_expect_success 'read-tree --prefix' '
init_repos &&
# If files differing between the index and target <commit-ish> exist
# inside the prefix, `read-tree --prefix` should fail
test_all_match test_must_fail git read-tree --prefix=deep/ deepest &&
test_all_match test_must_fail git read-tree --prefix=folder1/ update-folder1 &&
# If no differing index entries exist matching the prefix,
# `read-tree --prefix` updates the index successfully
test_all_match git rm -rf deep/deeper1/deepest/ &&
test_all_match git read-tree --prefix=deep/deeper1/deepest -u deepest &&
test_all_match git status --porcelain=v2 &&
rm: integrate with sparse-index Enable the sparse index within the `git-rm` command. The `p2000` tests demonstrate a ~92% execution time reduction for 'git rm' using a sparse index. Test HEAD~1 HEAD -------------------------------------------------------------------------- 2000.74: git rm ... (full-v3) 0.41(0.37+0.05) 0.43(0.36+0.07) +4.9% 2000.75: git rm ... (full-v4) 0.38(0.34+0.05) 0.39(0.35+0.05) +2.6% 2000.76: git rm ... (sparse-v3) 0.57(0.56+0.01) 0.05(0.05+0.00) -91.2% 2000.77: git rm ... (sparse-v4) 0.57(0.55+0.02) 0.03(0.03+0.00) -94.7% ---- Also, normalize a behavioral difference of `git-rm` under sparse-index. See related discussion [1]. `git-rm` a sparse-directory entry within a sparse-index enabled repo behaves differently from a sparse directory within a sparse-checkout enabled repo. For example, in a sparse-index repo, where 'folder1' is a sparse-directory entry, `git rm -r --sparse folder1` provides this: rm 'folder1/' Whereas in a sparse-checkout repo *without* sparse-index, doing so provides this: rm 'folder1/0/0/0' rm 'folder1/0/1' rm 'folder1/a' Because `git rm` a sparse-directory entry does not need to expand the index, therefore we should accept the current behavior, which is faster than "expand the sparse-directory entry to match the sparse-checkout situation". Modify a previous test so such difference is not considered as an error. [1] https://github.com/ffyuanda/git/pull/6#discussion_r934861398 Helped-by: Victoria Dye <vdye@github.com> Helped-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-07 04:13:35 +00:00
run_on_all git rm -rf --sparse folder1/ &&
test_all_match git read-tree --prefix=folder1/ -u update-folder1 &&
test_all_match git status --porcelain=v2 &&
test_all_match git rm -rf --sparse folder2/0 &&
test_all_match git read-tree --prefix=folder2/0/ -u rename-out-to-out &&
test_all_match git status --porcelain=v2
'
test_expect_success 'read-tree --merge with directory-file conflicts' '
init_repos &&
test_all_match git checkout -b test-branch rename-base &&
# Although the index matches, without --no-sparse-checkout, outside-of-
# definition files will not exist on disk for sparse checkouts
test_sparse_match git read-tree -mu rename-out-to-out &&
test_sparse_match git status --porcelain=v2 &&
test_path_is_missing sparse-checkout/folder2 &&
test_path_is_missing sparse-index/folder2 &&
test_sparse_match git read-tree --reset -u HEAD &&
test_sparse_match git status --porcelain=v2 &&
test_sparse_match git read-tree -mu --no-sparse-checkout rename-out-to-out &&
test_sparse_match git status --porcelain=v2 &&
test_cmp sparse-checkout/folder2/0/1 sparse-index/folder2/0/1
'
test_expect_success 'merge, cherry-pick, and rebase' '
init_repos &&
for OPERATION in "merge -m merge" cherry-pick "rebase --apply" "rebase --merge"
do
test_all_match git checkout -B temp update-deep &&
test_all_match git $OPERATION update-folder1 &&
test_all_match git rev-parse HEAD^{tree} &&
test_all_match git $OPERATION update-folder2 &&
test_all_match git rev-parse HEAD^{tree} || return 1
done
'
test_expect_success 'merge with conflict outside cone' '
init_repos &&
test_all_match git checkout -b merge-tip merge-left &&
test_all_match git status --porcelain=v2 &&
test_all_match test_must_fail git merge -m merge merge-right &&
test_all_match git status --porcelain=v2 &&
# Resolve the conflict in different ways:
# 1. Revert to the base
test_all_match git checkout base -- deep/deeper2/a &&
test_all_match git status --porcelain=v2 &&
# 2. Add the file with conflict markers
test_sparse_match test_must_fail git add folder1/a &&
grep "Disable or modify the sparsity rules" sparse-checkout-err &&
test_sparse_unstaged folder1/a &&
test_all_match git add --sparse folder1/a &&
test_all_match git status --porcelain=v2 &&
# 3. Rename the file to another sparse filename and
# accept conflict markers as resolved content.
run_on_all mv folder2/a folder2/z &&
test_sparse_match test_must_fail git add folder2 &&
grep "Disable or modify the sparsity rules" sparse-checkout-err &&
test_sparse_unstaged folder2/z &&
test_all_match git add --sparse folder2 &&
test_all_match git status --porcelain=v2 &&
test_all_match git merge --continue &&
test_all_match git status --porcelain=v2 &&
test_all_match git rev-parse HEAD^{tree}
'
test_expect_success 'cherry-pick/rebase with conflict outside cone' '
init_repos &&
for OPERATION in cherry-pick rebase
do
test_all_match git checkout -B tip &&
test_all_match git reset --hard merge-left &&
test_all_match git status --porcelain=v2 &&
test_all_match test_must_fail git $OPERATION merge-right &&
test_all_match git status --porcelain=v2 &&
# Resolve the conflict in different ways:
# 1. Revert to the base
test_all_match git checkout base -- deep/deeper2/a &&
test_all_match git status --porcelain=v2 &&
# 2. Add the file with conflict markers
# NEEDSWORK: Even though the merge conflict removed the
# SKIP_WORKTREE bit from the index entry for folder1/a, we should
# warn that this is a problematic add.
test_sparse_match test_must_fail git add folder1/a &&
grep "Disable or modify the sparsity rules" sparse-checkout-err &&
test_sparse_unstaged folder1/a &&
test_all_match git add --sparse folder1/a &&
test_all_match git status --porcelain=v2 &&
# 3. Rename the file to another sparse filename and
# accept conflict markers as resolved content.
# NEEDSWORK: This mode now fails, because folder2/z is
# outside of the sparse-checkout cone and does not match an
# existing index entry with the SKIP_WORKTREE bit cleared.
run_on_all mv folder2/a folder2/z &&
test_sparse_match test_must_fail git add folder2 &&
grep "Disable or modify the sparsity rules" sparse-checkout-err &&
test_sparse_unstaged folder2/z &&
test_all_match git add --sparse folder2 &&
test_all_match git status --porcelain=v2 &&
test_all_match git $OPERATION --continue &&
test_all_match git status --porcelain=v2 &&
test_all_match git rev-parse HEAD^{tree} || return 1
done
'
test_expect_success 'merge with outside renames' '
init_repos &&
for type in out-to-out out-to-in in-to-out
do
test_all_match git reset --hard &&
test_all_match git checkout -f -b merge-$type update-deep &&
test_all_match git merge -m "$type" rename-$type &&
test_all_match git rev-parse HEAD^{tree} || return 1
done
'
sparse-index: skip indexes with unmerged entries The sparse-index format is designed to be compatible with merge conflicts, even those outside the sparse-checkout definition. The reason is that when converting a full index to a sparse one, a cache entry with nonzero stage will not be collapsed into a sparse directory entry. However, this behavior was not tested, and a different behavior within convert_to_sparse() fails in this scenario. Specifically, cache_tree_update() will fail when unmerged entries exist. convert_to_sparse_rec() uses the cache-tree data to recursively walk the tree structure, but also to compute the OIDs used in the sparse-directory entries. Add an index scan to convert_to_sparse() that will detect if these merge conflict entries exist and skip the conversion before trying to update the cache-tree. This is marked as NEEDSWORK because this can be removed with a suitable update to cache_tree_update() or a similar method that can construct a cache-tree with invalid nodes, but still allow creating the nodes necessary for creating sparse directory entries. It is possible that in the future we will not need to make such an update, since if we do not expand a sparse-index into a full one, this conversion does not need to happen. Thus, this can be deferred until the merge machinery is made to integrate with the sparse-index. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-14 13:12:25 +00:00
# Sparse-index fails to convert the index in the
# final 'git cherry-pick' command.
test_expect_success 'cherry-pick with conflicts' '
init_repos &&
write_script edit-conflict <<-\EOF &&
echo $1 >conflict
EOF
test_all_match git checkout -b to-cherry-pick &&
run_on_all ../edit-conflict ABC &&
test_all_match git add conflict &&
test_all_match git commit -m "conflict to pick" &&
test_all_match git checkout -B base HEAD~1 &&
run_on_all ../edit-conflict DEF &&
test_all_match git add conflict &&
test_all_match git commit -m "conflict in base" &&
test_all_match test_must_fail git cherry-pick to-cherry-pick
'
test_expect_success 'stash' '
init_repos &&
write_script edit-contents <<-\EOF &&
echo text >>$1
EOF
# Stash a sparse directory (folder1)
test_all_match git checkout -b test-branch rename-base &&
test_all_match git reset --soft rename-out-to-out &&
test_all_match git stash &&
test_all_match git status --porcelain=v2 &&
# Apply the sparse directory stash without reinstating the index
test_all_match git stash apply -q &&
test_all_match git status --porcelain=v2 &&
# Reset to state where stash can be applied
test_sparse_match git sparse-checkout reapply &&
test_all_match git reset --hard rename-out-to-out &&
# Apply the sparse directory stash *with* reinstating the index
test_all_match git stash apply --index -q &&
test_all_match git status --porcelain=v2 &&
# Reset to state where we will get a conflict applying the stash
test_sparse_match git sparse-checkout reapply &&
test_all_match git reset --hard update-folder1 &&
# Apply the sparse directory stash with conflicts
test_all_match test_must_fail git stash apply --index -q &&
test_all_match test_must_fail git stash apply -q &&
test_all_match git status --porcelain=v2 &&
# Reset to base branch
test_sparse_match git sparse-checkout reapply &&
test_all_match git reset --hard base &&
# Stash & unstash an untracked file outside of the sparse checkout
# definition.
run_on_sparse mkdir -p folder1 &&
run_on_all ../edit-contents folder1/new &&
test_all_match git stash -u &&
test_all_match git status --porcelain=v2 &&
test_all_match git stash pop -q &&
test_all_match git status --porcelain=v2
'
test_expect_success 'checkout-index inside sparse definition' '
init_repos &&
run_on_all rm -f deep/a &&
test_all_match git checkout-index -- deep/a &&
test_all_match git status --porcelain=v2 &&
echo test >>new-a &&
run_on_all cp ../new-a a &&
test_all_match test_must_fail git checkout-index -- a &&
test_all_match git checkout-index -f -- a &&
test_all_match git status --porcelain=v2
'
test_expect_success 'checkout-index outside sparse definition' '
init_repos &&
# Without --ignore-skip-worktree-bits, outside-of-cone files will trigger
# an error
test_sparse_match test_must_fail git checkout-index -- folder1/a &&
test_grep "folder1/a has skip-worktree enabled" sparse-checkout-err &&
test_path_is_missing folder1/a &&
# With --ignore-skip-worktree-bits, outside-of-cone files are checked out
test_sparse_match git checkout-index --ignore-skip-worktree-bits -- folder1/a &&
test_cmp sparse-checkout/folder1/a sparse-index/folder1/a &&
test_cmp sparse-checkout/folder1/a full-checkout/folder1/a &&
run_on_sparse rm -rf folder1 &&
echo test >new-a &&
run_on_sparse mkdir -p folder1 &&
run_on_all cp ../new-a folder1/a &&
test_all_match test_must_fail git checkout-index --ignore-skip-worktree-bits -- folder1/a &&
test_all_match git checkout-index -f --ignore-skip-worktree-bits -- folder1/a &&
test_cmp sparse-checkout/folder1/a sparse-index/folder1/a &&
test_cmp sparse-checkout/folder1/a full-checkout/folder1/a
'
test_expect_success 'checkout-index with folders' '
init_repos &&
# Inside checkout definition
test_all_match test_must_fail git checkout-index -f -- deep/ &&
# Outside checkout definition
# Note: although all tests fail (as expected), the messaging differs. For
# non-sparse index checkouts, the error is that the "file" does not appear
# in the index; for sparse checkouts, the error is explicitly that the
# entry is a sparse directory.
run_on_all test_must_fail git checkout-index -f -- folder1/ &&
test_cmp full-checkout-err sparse-checkout-err &&
! test_cmp full-checkout-err sparse-index-err &&
grep "is a sparse directory" sparse-index-err
'
test_expect_success 'checkout-index --all' '
init_repos &&
test_all_match git checkout-index --all &&
test_sparse_match test_path_is_missing folder1 &&
# --ignore-skip-worktree-bits will cause `skip-worktree` files to be
# checked out, causing the outside-of-cone `folder1` to exist on-disk
test_all_match git checkout-index --ignore-skip-worktree-bits --all &&
test_all_match test_path_exists folder1
'
test_expect_success 'clean' '
init_repos &&
echo bogus >>.gitignore &&
run_on_all cp ../.gitignore . &&
test_all_match git add .gitignore &&
test_all_match git commit -m "ignore bogus files" &&
run_on_sparse mkdir folder1 &&
run_on_all mkdir -p deep/untracked-deep &&
run_on_all touch folder1/bogus &&
run_on_all touch folder1/untracked &&
run_on_all touch deep/untracked-deep/bogus &&
run_on_all touch deep/untracked-deep/untracked &&
test_all_match git status --porcelain=v2 &&
test_all_match git clean -f &&
test_all_match git status --porcelain=v2 &&
sparse-index: convert from full to sparse If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 13:10:55 +00:00
test_sparse_match ls &&
test_sparse_match ls folder1 &&
run_on_all test_path_exists folder1/bogus &&
run_on_all test_path_is_missing folder1/untracked &&
run_on_all test_path_exists deep/untracked-deep/bogus &&
run_on_all test_path_exists deep/untracked-deep/untracked &&
test_all_match git clean -fd &&
test_all_match git status --porcelain=v2 &&
test_sparse_match ls &&
test_sparse_match ls folder1 &&
run_on_all test_path_exists folder1/bogus &&
run_on_all test_path_exists deep/untracked-deep/bogus &&
run_on_all test_path_is_missing deep/untracked-deep/untracked &&
test_all_match git clean -xf &&
test_all_match git status --porcelain=v2 &&
sparse-index: convert from full to sparse If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 13:10:55 +00:00
test_sparse_match ls &&
test_sparse_match ls folder1 &&
run_on_all test_path_is_missing folder1/bogus &&
run_on_all test_path_exists deep/untracked-deep/bogus &&
test_all_match git clean -xdf &&
test_all_match git status --porcelain=v2 &&
sparse-index: convert from full to sparse If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 13:10:55 +00:00
test_sparse_match ls &&
test_sparse_match ls folder1 &&
run_on_all test_path_is_missing deep/untracked-deep/bogus &&
sparse-index: convert from full to sparse If we have a full index, then we can convert it to a sparse index by replacing directories outside of the sparse cone with sparse directory entries. The convert_to_sparse() method does this, when the situation is appropriate. For now, we avoid converting the index to a sparse index if: 1. the index is split. 2. the index is already sparse. 3. sparse-checkout is disabled. 4. sparse-checkout does not use cone mode. Finally, we currently limit the conversion to when the GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git config will be added in a later change. The trickiest thing about this conversion is that we might not be able to mark a directory as a sparse directory just because it is outside the sparse cone. There might be unmerged files within that directory, so we need to look for those. Also, if there is some strange reason why a file is not marked with CE_SKIP_WORKTREE, then we should give up on converting that directory. There is still hope that some of its subdirectories might be able to convert to sparse, so we keep looking deeper. The conversion process is assisted by the cache-tree extension. This is calculated from the full index if it does not already exist. We then abandon the cache-tree as it no longer applies to the newly-sparse index. Thus, this cache-tree will be recalculated in every sparse-full-sparse round-trip until we integrate the cache-tree extension with the sparse index. Some Git commands use the index after writing it. For example, 'git add' will update the index, then write it to disk, then read its entries to report information. To keep the in-memory index in a full state after writing, we re-expand it to a full one after the write. This is wasteful for commands that only write the index and do not read from it again, but that is only the case until we make those commands "sparse aware." We can compare the behavior of the sparse-index in t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1 when operating on the 'sparse-index' repo. We can also compare the two sparse repos directly, such as comparing their indexes (when expanded to full in the case of the 'sparse-index' repo). We also verify that the index is actually populated with sparse directory entries. The 'checkout and reset (mixed)' test is marked for failure when comparing a sparse repo to a full repo, but we can compare the two sparse-checkout cases directly to ensure that we are not changing the behavior when using a sparse index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 13:10:55 +00:00
test_sparse_match test_path_is_dir folder1
'
for builtin in show rev-parse
do
test_expect_success "$builtin (cached blobs/trees)" "
init_repos &&
test_all_match git $builtin :a &&
test_all_match git $builtin :deep/a &&
test_sparse_match git $builtin :folder1/a &&
# The error message differs depending on whether
# the directory exists in the worktree.
test_all_match test_must_fail git $builtin :deep/ &&
test_must_fail git -C full-checkout $builtin :folder1/ &&
test_sparse_match test_must_fail git $builtin :folder1/ &&
# Change the sparse cone for an extra case:
run_on_sparse git sparse-checkout set deep/deeper1 &&
# deep/deeper2 is a sparse directory in the sparse index.
test_sparse_match test_must_fail git $builtin :deep/deeper2/ &&
# deep/deeper2/deepest is not in the sparse index, but
# will trigger an index expansion.
test_sparse_match test_must_fail git $builtin :deep/deeper2/deepest/
"
done
test_expect_success 'submodule handling' '
init_repos &&
test_sparse_match git sparse-checkout add modules &&
test_all_match mkdir modules &&
test_all_match touch modules/a &&
test_all_match git add modules &&
test_all_match git commit -m "add modules directory" &&
test_config_global protocol.file.allow always &&
run_on_all git submodule add "$(pwd)/initial-repo" modules/sub &&
test_all_match git commit -m "add submodule" &&
# having a submodule prevents "modules" from collapse
test_sparse_match git sparse-checkout set deep/deeper1 &&
git -C sparse-index ls-files --sparse --stage >cache &&
grep "100644 .* modules/a" cache &&
grep "160000 $(git -C initial-repo rev-parse HEAD) 0 modules/sub" cache
'
# When working with a sparse index, some commands will need to expand the
# index to operate properly. If those commands also write the index back
# to disk, they need to convert the index to sparse before writing.
# This test verifies that both of these events are logged in trace2 logs.
test_expect_success 'sparse-index is expanded and converted back' '
init_repos &&
GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
git -C sparse-index reset -- folder1/a &&
test_region index convert_to_sparse trace2.txt &&
test_region index ensure_full_index trace2.txt &&
# ls-files expands on read, but does not write.
rm trace2.txt &&
GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
git -C sparse-index ls-files &&
status: use sparse-index throughout By testing 'git -c core.fsmonitor= status -uno', we can check for the simplest index operations that can be made sparse-aware. The necessary implementation details are already integrated with sparse-checkout, so modify command_requires_full_index to be zero for cmd_status(). In refresh_index(), we loop through the index entries to refresh their stat() information. However, sparse directories have no stat() information to populate. Ignore these entries. This allows 'git status' to no longer expand a sparse index to a full one. This is further tested by dropping the "-uno" option and adding an untracked file into the worktree. The performance test p2000-sparse-checkout-operations.sh demonstrates these improvements: Test HEAD~1 HEAD ----------------------------------------------------------------------------- 2000.2: git status (full-index-v3) 0.31(0.30+0.05) 0.31(0.29+0.06) +0.0% 2000.3: git status (full-index-v4) 0.31(0.29+0.07) 0.34(0.30+0.08) +9.7% 2000.4: git status (sparse-index-v3) 2.35(2.28+0.10) 0.04(0.04+0.05) -98.3% 2000.5: git status (sparse-index-v4) 2.35(2.24+0.15) 0.05(0.04+0.06) -97.9% Note that since HEAD~1 was expanding the sparse index by parsing trees, it was artificially slower than the full index case. Thus, the 98% improvement is misleading, and instead we should celebrate the 0.34s to 0.05s improvement of 85%. This is more indicative of the peformance gains we are expecting by using a sparse index. Note: we are dropping the assignment of core.fsmonitor here. This is not necessary for the test script as we are not altering the config any other way. Correct integration with FS Monitor will be validated in later changes. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-14 13:12:37 +00:00
test_region index ensure_full_index trace2.txt
'
test_expect_success 'index.sparse disabled inline uses full index' '
init_repos &&
# When index.sparse is disabled inline with `git status`, the
# index is expanded at the beginning of the execution then never
# converted back to sparse. It is then written to disk as a full index.
rm -f trace2.txt &&
GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
git -C sparse-index -c index.sparse=false status &&
! test_region index convert_to_sparse trace2.txt &&
test_region index ensure_full_index trace2.txt &&
# Since index.sparse is set to true at a repo level, the index
# is converted from full to sparse when read, then never expanded
# over the course of `git status`. It is written to disk as a sparse
# index.
rm -f trace2.txt &&
GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
git -C sparse-index status &&
test_region index convert_to_sparse trace2.txt &&
! test_region index ensure_full_index trace2.txt &&
# Now that the index has been written to disk as sparse, it is not
# converted to sparse (or expanded to full) when read by `git status`.
rm -f trace2.txt &&
GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
git -C sparse-index status &&
! test_region index convert_to_sparse trace2.txt &&
! test_region index ensure_full_index trace2.txt
'
run_sparse_index_trace2 () {
status: use sparse-index throughout By testing 'git -c core.fsmonitor= status -uno', we can check for the simplest index operations that can be made sparse-aware. The necessary implementation details are already integrated with sparse-checkout, so modify command_requires_full_index to be zero for cmd_status(). In refresh_index(), we loop through the index entries to refresh their stat() information. However, sparse directories have no stat() information to populate. Ignore these entries. This allows 'git status' to no longer expand a sparse index to a full one. This is further tested by dropping the "-uno" option and adding an untracked file into the worktree. The performance test p2000-sparse-checkout-operations.sh demonstrates these improvements: Test HEAD~1 HEAD ----------------------------------------------------------------------------- 2000.2: git status (full-index-v3) 0.31(0.30+0.05) 0.31(0.29+0.06) +0.0% 2000.3: git status (full-index-v4) 0.31(0.29+0.07) 0.34(0.30+0.08) +9.7% 2000.4: git status (sparse-index-v3) 2.35(2.28+0.10) 0.04(0.04+0.05) -98.3% 2000.5: git status (sparse-index-v4) 2.35(2.24+0.15) 0.05(0.04+0.06) -97.9% Note that since HEAD~1 was expanding the sparse index by parsing trees, it was artificially slower than the full index case. Thus, the 98% improvement is misleading, and instead we should celebrate the 0.34s to 0.05s improvement of 85%. This is more indicative of the peformance gains we are expecting by using a sparse index. Note: we are dropping the assignment of core.fsmonitor here. This is not necessary for the test script as we are not altering the config any other way. Correct integration with FS Monitor will be validated in later changes. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-14 13:12:37 +00:00
rm -f trace2.txt &&
if test -z "$WITHOUT_UNTRACKED_TXT"
then
echo >>sparse-index/untracked.txt
fi &&
if test "$1" = "!"
then
shift &&
test_must_fail env \
GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
git -C sparse-index "$@" \
>sparse-index-out \
2>sparse-index-error || return 1
else
GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
git -C sparse-index "$@" \
>sparse-index-out \
2>sparse-index-error || return 1
fi
}
ensure_expanded () {
run_sparse_index_trace2 "$@" &&
test_region index ensure_full_index trace2.txt
}
ensure_not_expanded () {
run_sparse_index_trace2 "$@" &&
status: use sparse-index throughout By testing 'git -c core.fsmonitor= status -uno', we can check for the simplest index operations that can be made sparse-aware. The necessary implementation details are already integrated with sparse-checkout, so modify command_requires_full_index to be zero for cmd_status(). In refresh_index(), we loop through the index entries to refresh their stat() information. However, sparse directories have no stat() information to populate. Ignore these entries. This allows 'git status' to no longer expand a sparse index to a full one. This is further tested by dropping the "-uno" option and adding an untracked file into the worktree. The performance test p2000-sparse-checkout-operations.sh demonstrates these improvements: Test HEAD~1 HEAD ----------------------------------------------------------------------------- 2000.2: git status (full-index-v3) 0.31(0.30+0.05) 0.31(0.29+0.06) +0.0% 2000.3: git status (full-index-v4) 0.31(0.29+0.07) 0.34(0.30+0.08) +9.7% 2000.4: git status (sparse-index-v3) 2.35(2.28+0.10) 0.04(0.04+0.05) -98.3% 2000.5: git status (sparse-index-v4) 2.35(2.24+0.15) 0.05(0.04+0.06) -97.9% Note that since HEAD~1 was expanding the sparse index by parsing trees, it was artificially slower than the full index case. Thus, the 98% improvement is misleading, and instead we should celebrate the 0.34s to 0.05s improvement of 85%. This is more indicative of the peformance gains we are expecting by using a sparse index. Note: we are dropping the assignment of core.fsmonitor here. This is not necessary for the test script as we are not altering the config any other way. Correct integration with FS Monitor will be validated in later changes. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-14 13:12:37 +00:00
test_region ! index ensure_full_index trace2.txt
commit: integrate with sparse-index Update 'git commit' to allow using the sparse-index in memory without expanding to a full one. The only place that had an ensure_full_index() call was in cache_tree_update(). The recursive algorithm for update_one() was already updated in 2de37c536 (cache-tree: integrate with sparse directory entries, 2021-03-03) to handle sparse directory entries in the index. Most of this change involves testing different command-line options that allow specifying which on-disk changes should be included in the commit. This includes no options (only take currently-staged changes), -a (take all tracked changes), and --include (take a list of specific changes). To simplify testing that these options do not expand the index, update the test that previously verified that 'git status' does not expand the index with a helper method, ensure_not_expanded(). This allows 'git commit' to operate much faster when the sparse-checkout cone is much smaller than the full list of files at HEAD. Here are the relevant lines from p2000-sparse-operations.sh: Test HEAD~1 HEAD ---------------------------------------------------------------------------------- 2000.14: git commit -a -m A (full-v3) 0.35(0.26+0.06) 0.36(0.28+0.07) +2.9% 2000.15: git commit -a -m A (full-v4) 0.32(0.26+0.05) 0.34(0.28+0.06) +6.3% 2000.16: git commit -a -m A (sparse-v3) 0.63(0.59+0.06) 0.04(0.05+0.05) -93.7% 2000.17: git commit -a -m A (sparse-v4) 0.64(0.59+0.08) 0.04(0.04+0.04) -93.8% It is important to compare the full-index case to the sparse-index case, so the improvement for index version v4 is actually an 88% improvement in this synthetic example. In a real repository with over two million files at HEAD and 60,000 files in the sparse-checkout definition, the time for 'git commit -a' went from 2.61 seconds to 134ms. I compared this to the result if the index only contained the paths in the sparse-checkout definition and found the theoretical optimum to be 120ms, so the out-of-cone paths only add a 12% overhead. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-29 02:13:04 +00:00
}
test_expect_success 'sparse-index is not expanded' '
init_repos &&
ensure_not_expanded status &&
ensure_not_expanded ls-files --sparse &&
commit: integrate with sparse-index Update 'git commit' to allow using the sparse-index in memory without expanding to a full one. The only place that had an ensure_full_index() call was in cache_tree_update(). The recursive algorithm for update_one() was already updated in 2de37c536 (cache-tree: integrate with sparse directory entries, 2021-03-03) to handle sparse directory entries in the index. Most of this change involves testing different command-line options that allow specifying which on-disk changes should be included in the commit. This includes no options (only take currently-staged changes), -a (take all tracked changes), and --include (take a list of specific changes). To simplify testing that these options do not expand the index, update the test that previously verified that 'git status' does not expand the index with a helper method, ensure_not_expanded(). This allows 'git commit' to operate much faster when the sparse-checkout cone is much smaller than the full list of files at HEAD. Here are the relevant lines from p2000-sparse-operations.sh: Test HEAD~1 HEAD ---------------------------------------------------------------------------------- 2000.14: git commit -a -m A (full-v3) 0.35(0.26+0.06) 0.36(0.28+0.07) +2.9% 2000.15: git commit -a -m A (full-v4) 0.32(0.26+0.05) 0.34(0.28+0.06) +6.3% 2000.16: git commit -a -m A (sparse-v3) 0.63(0.59+0.06) 0.04(0.05+0.05) -93.7% 2000.17: git commit -a -m A (sparse-v4) 0.64(0.59+0.08) 0.04(0.04+0.04) -93.8% It is important to compare the full-index case to the sparse-index case, so the improvement for index version v4 is actually an 88% improvement in this synthetic example. In a real repository with over two million files at HEAD and 60,000 files in the sparse-checkout definition, the time for 'git commit -a' went from 2.61 seconds to 134ms. I compared this to the result if the index only contained the paths in the sparse-checkout definition and found the theoretical optimum to be 120ms, so the out-of-cone paths only add a 12% overhead. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-29 02:13:04 +00:00
ensure_not_expanded commit --allow-empty -m empty &&
echo >>sparse-index/a &&
ensure_not_expanded commit -a -m a &&
echo >>sparse-index/a &&
ensure_not_expanded commit --include a -m a &&
echo >>sparse-index/deep/deeper1/a &&
checkout: stop expanding sparse indexes Previous changes did the necessary improvements to unpack-trees.c and diff-lib.c in order to modify a sparse index based on its comparision with a tree. The only remaining work is to remove some ensure_full_index() calls and add tests that verify that the index is not expanded in our interesting cases. Include 'switch' and 'restore' in these tests, as they share a base implementation with 'checkout'. Here are the relevant performance results from p2000-sparse-operations.sh: Test HEAD~1 HEAD -------------------------------------------------------------------------------- 2000.18: git checkout -f - (full-v3) 0.49(0.43+0.03) 0.47(0.39+0.05) -4.1% 2000.19: git checkout -f - (full-v4) 0.45(0.37+0.06) 0.42(0.37+0.05) -6.7% 2000.20: git checkout -f - (sparse-v3) 0.76(0.71+0.07) 0.04(0.03+0.04) -94.7% 2000.21: git checkout -f - (sparse-v4) 0.75(0.72+0.04) 0.05(0.06+0.04) -93.3% It is important to compare the full index case to the sparse index case, as the previous results for the sparse index were inflated by the index expansion. For index v4, this is an 88% improvement. On an internal repository with over two million paths at HEAD and a sparse-checkout definition containing ~60,000 of those paths, 'git checkout' went from 3.5s to 297ms with this change. The theoretical optimum where only those ~60,000 paths exist was 275ms, so the extra sparse directory entries contribute a 22ms overhead. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-29 02:13:06 +00:00
ensure_not_expanded commit --include deep/deeper1/a -m deeper &&
ensure_not_expanded checkout rename-out-to-out &&
ensure_not_expanded checkout - &&
ensure_not_expanded switch rename-out-to-out &&
ensure_not_expanded switch - &&
ensure_not_expanded reset --hard &&
checkout: stop expanding sparse indexes Previous changes did the necessary improvements to unpack-trees.c and diff-lib.c in order to modify a sparse index based on its comparision with a tree. The only remaining work is to remove some ensure_full_index() calls and add tests that verify that the index is not expanded in our interesting cases. Include 'switch' and 'restore' in these tests, as they share a base implementation with 'checkout'. Here are the relevant performance results from p2000-sparse-operations.sh: Test HEAD~1 HEAD -------------------------------------------------------------------------------- 2000.18: git checkout -f - (full-v3) 0.49(0.43+0.03) 0.47(0.39+0.05) -4.1% 2000.19: git checkout -f - (full-v4) 0.45(0.37+0.06) 0.42(0.37+0.05) -6.7% 2000.20: git checkout -f - (sparse-v3) 0.76(0.71+0.07) 0.04(0.03+0.04) -94.7% 2000.21: git checkout -f - (sparse-v4) 0.75(0.72+0.04) 0.05(0.06+0.04) -93.3% It is important to compare the full index case to the sparse index case, as the previous results for the sparse index were inflated by the index expansion. For index v4, this is an 88% improvement. On an internal repository with over two million paths at HEAD and a sparse-checkout definition containing ~60,000 of those paths, 'git checkout' went from 3.5s to 297ms with this change. The theoretical optimum where only those ~60,000 paths exist was 275ms, so the extra sparse directory entries contribute a 22ms overhead. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-29 02:13:06 +00:00
ensure_not_expanded checkout rename-out-to-out -- deep/deeper1 &&
ensure_not_expanded reset --hard &&
add: allow operating on a sparse-only index Disable command_requires_full_index for 'git add'. This does not require any additional removals of ensure_full_index(). The main reason is that 'git add' discovers changes based on the pathspec and the worktree itself. These are then inserted into the index directly, and calls to index_name_pos() or index_file_exists() already call expand_to_path() at the appropriate time to support a sparse-index. Add a test to check that 'git add -A' and 'git add <file>' does not expand the index at all, as long as <file> is not within a sparse directory. This does not help the global 'git add .' case. We can measure the improvement using p2000-sparse-operations.sh with these results: Test HEAD~1 HEAD ------------------------------------------------------------------------------ 2000.6: git add -A (full-index-v3) 0.35(0.30+0.05) 0.37(0.29+0.06) +5.7% 2000.7: git add -A (full-index-v4) 0.31(0.26+0.06) 0.33(0.27+0.06) +6.5% 2000.8: git add -A (sparse-index-v3) 0.57(0.53+0.07) 0.05(0.04+0.08) -91.2% 2000.9: git add -A (sparse-index-v4) 0.58(0.55+0.06) 0.05(0.05+0.06) -91.4% While the 91% improvement seems impressive, it's important to recognize that previously we had significant overhead for expanding the sparse-index. Comparing to the full index case, 'git add -A' goes from 0.37s to 0.05s, which is "only" an 86% improvement. This modification to 'git add' creates some behavior change depending on the use of a sparse index. We modify a test in t1092 to demonstrate these changes which will be remedied in future changes. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-29 14:52:04 +00:00
ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 &&
echo >>sparse-index/README.md &&
ensure_not_expanded add -A &&
echo >>sparse-index/extra.txt &&
ensure_not_expanded add extra.txt &&
echo >>sparse-index/untracked.txt &&
ensure_not_expanded add . &&
ensure_not_expanded checkout-index -f a &&
ensure_not_expanded checkout-index -f --all &&
for ref in update-deep update-folder1 update-folder2 update-deep
do
echo >>sparse-index/README.md &&
ensure_not_expanded reset --hard $ref || return 1
done &&
ensure_not_expanded reset --mixed base &&
ensure_not_expanded reset --hard update-deep &&
ensure_not_expanded reset --keep base &&
ensure_not_expanded reset --merge update-deep &&
ensure_not_expanded reset --hard &&
ensure_not_expanded reset base -- deep/a &&
ensure_not_expanded reset base -- nonexistent-file &&
ensure_not_expanded reset deepest -- deep &&
# Although folder1 is outside the sparse definition, it exists as a
# directory entry in the index, so the pathspec will not force the
# index to be expanded.
ensure_not_expanded reset deepest -- folder1 &&
ensure_not_expanded reset deepest -- folder1/ &&
# Wildcard identifies only in-cone files, no index expansion
ensure_not_expanded reset deepest -- deep/\* &&
# Wildcard identifies only full sparse directories, no index expansion
ensure_not_expanded reset deepest -- folder\* &&
ensure_not_expanded clean -fd &&
ensure_not_expanded checkout -f update-deep &&
test_config -C sparse-index pull.twohead ort &&
(
sane_unset GIT_TEST_MERGE_ALGORITHM &&
for OPERATION in "merge -m merge" cherry-pick rebase
do
ensure_not_expanded merge -m merge update-folder1 &&
ensure_not_expanded merge -m merge update-folder2 || return 1
done
)
'
test_expect_success 'sparse-index is not expanded: merge conflict in cone' '
init_repos &&
for side in right left
do
git -C sparse-index checkout -b expand-$side base &&
echo $side >sparse-index/deep/a &&
git -C sparse-index commit -a -m "$side" || return 1
done &&
(
sane_unset GIT_TEST_MERGE_ALGORITHM &&
git -C sparse-index config pull.twohead ort &&
ensure_not_expanded ! merge -m merged expand-right
)
'
test_expect_success 'sparse-index is not expanded: stash' '
init_repos &&
echo >>sparse-index/a &&
ensure_not_expanded stash &&
ensure_not_expanded stash list &&
ensure_not_expanded stash show stash@{0} &&
ensure_not_expanded stash apply stash@{0} &&
ensure_not_expanded stash drop stash@{0} &&
echo >>sparse-index/deep/new &&
ensure_not_expanded stash -u &&
(
WITHOUT_UNTRACKED_TXT=1 &&
ensure_not_expanded stash pop
) &&
ensure_not_expanded stash create &&
oid=$(git -C sparse-index stash create) &&
ensure_not_expanded stash store -m "test" $oid &&
ensure_not_expanded reset --hard &&
ensure_not_expanded stash pop
'
test_expect_success 'describe tested on all' '
init_repos &&
# Add tag to be read by describe
run_on_all git tag -a v1.0 -m "Version 1" &&
test_all_match git describe --dirty &&
run_on_all rm g &&
test_all_match git describe --dirty
'
test_expect_success 'sparse-index is not expanded: describe' '
init_repos &&
# Add tag to be read by describe
git -C sparse-index tag -a v1.0 -m "Version 1" &&
ensure_not_expanded describe --dirty &&
echo "test" >>sparse-index/g &&
ensure_not_expanded describe --dirty &&
ensure_not_expanded describe
'
diff: enable and test the sparse index Enable the sparse index within the 'git diff' command. Its implementation already safely integrates with the sparse index because it shares code with the 'git status' and 'git checkout' commands that were already integrated. For more details see: d76723ee53 (status: use sparse-index throughout, 2021-07-14) 1ba5f45132 (checkout: stop expanding sparse indexes, 2021-06-29) The most interesting thing to do is to add tests that verify that 'git diff' behaves correctly when the sparse index is enabled. These cases are: 1. The index is not expanded for 'diff' and 'diff --staged' 2. 'diff' and 'diff --staged' behave the same in full checkout, sparse checkout, and sparse index repositories in the following partially-staged scenarios (i.e. the index, HEAD, and working directory differ at a given path): 1. Path is within sparse-checkout cone 2. Path is outside sparse-checkout cone 3. A merge conflict exists for paths outside sparse-checkout cone The `p2000` tests demonstrate a ~44% execution time reduction for 'git diff' and a ~86% execution time reduction for 'git diff --staged' using a sparse index: Test before after ------------------------------------------------------------- 2000.30: git diff (full-v3) 0.33 0.34 +3.0% 2000.31: git diff (full-v4) 0.33 0.35 +6.1% 2000.32: git diff (sparse-v3) 0.53 0.31 -41.5% 2000.33: git diff (sparse-v4) 0.54 0.29 -46.3% 2000.34: git diff --cached (full-v3) 0.07 0.07 +0.0% 2000.35: git diff --cached (full-v4) 0.07 0.08 +14.3% 2000.36: git diff --cached (sparse-v3) 0.28 0.04 -85.7% 2000.37: git diff --cached (sparse-v4) 0.23 0.03 -87.0% Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Lessley Dennington <lessleydennington@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-06 15:56:00 +00:00
test_expect_success 'sparse index is not expanded: diff' '
init_repos &&
write_script edit-contents <<-\EOF &&
echo text >>$1
EOF
# Add file within cone
test_sparse_match git sparse-checkout set deep &&
run_on_all ../edit-contents deep/testfile &&
test_all_match git add deep/testfile &&
run_on_all ../edit-contents deep/testfile &&
test_all_match git diff &&
test_all_match git diff --cached &&
ensure_not_expanded diff &&
ensure_not_expanded diff --cached &&
# Add file outside cone
test_all_match git reset --hard &&
run_on_all mkdir newdirectory &&
run_on_all ../edit-contents newdirectory/testfile &&
test_sparse_match git sparse-checkout set newdirectory &&
test_all_match git add newdirectory/testfile &&
run_on_all ../edit-contents newdirectory/testfile &&
test_sparse_match git sparse-checkout set &&
test_all_match git diff &&
test_all_match git diff --cached &&
ensure_not_expanded diff &&
ensure_not_expanded diff --cached &&
# Merge conflict outside cone
# The sparse checkout will report a warning that is not in the
# full checkout, so we use `run_on_all` instead of
# `test_all_match`
run_on_all git reset --hard &&
test_all_match git checkout merge-left &&
test_all_match test_must_fail git merge merge-right &&
test_all_match git diff &&
test_all_match git diff --cached &&
ensure_not_expanded diff &&
ensure_not_expanded diff --cached
'
test_expect_success 'sparse index is not expanded: show and rev-parse' '
init_repos &&
ensure_not_expanded show :a &&
ensure_not_expanded show :deep/a &&
ensure_not_expanded rev-parse :a &&
ensure_not_expanded rev-parse :deep/a
'
test_expect_success 'sparse index is not expanded: update-index' '
init_repos &&
deep_a_oid=$(git -C full-checkout rev-parse update-deep:deep/a) &&
ensure_not_expanded update-index --cacheinfo 100644 $deep_a_oid deep/a &&
echo "test" >sparse-index/README.md &&
echo "test2" >sparse-index/a &&
rm -f sparse-index/deep/a &&
ensure_not_expanded update-index --add README.md &&
ensure_not_expanded update-index a &&
ensure_not_expanded update-index --remove deep/a &&
ensure_not_expanded reset --soft update-deep &&
ensure_not_expanded update-index --add --remove --again
'
blame: enable and test the sparse index Enable the sparse index for the 'git blame' command. The index was already not expanded with this command, so the most interesting thing to do is to add tests that verify that 'git blame' behaves correctly when the sparse index is enabled and that its performance improves. More specifically, these cases are: 1. The index is not expanded for 'blame' when given paths in the sparse checkout cone at multiple levels. 2. Performance measurably improves for 'blame' with sparse index when given paths in the sparse checkout cone at multiple levels. The `p2000` tests demonstrate a ~60% execution time reduction when running 'blame' for a file two levels deep and and a ~30% execution time reduction for a file three levels deep. Test before after ---------------------------------------------------------------- 2000.62: git blame f2/f4/a (full-v3) 0.31 0.32 +3.2% 2000.63: git blame f2/f4/a (full-v4) 0.29 0.31 +6.9% 2000.64: git blame f2/f4/a (sparse-v3) 0.55 0.23 -58.2% 2000.65: git blame f2/f4/a (sparse-v4) 0.57 0.23 -59.6% 2000.66: git blame f2/f4/f3/a (full-v3) 0.77 0.85 +10.4% 2000.67: git blame f2/f4/f3/a (full-v4) 0.78 0.81 +3.8% 2000.68: git blame f2/f4/f3/a (sparse-v3) 1.07 0.72 -32.7% 2000.99: git blame f2/f4/f3/a (sparse-v4) 1.05 0.73 -30.5% We do not include paths outside the sparse checkout cone because blame does not support blaming files that are not present in the working directory. This is true in both sparse and full checkouts. Signed-off-by: Lessley Dennington <lessleydennington@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-06 15:56:01 +00:00
test_expect_success 'sparse index is not expanded: blame' '
init_repos &&
for file in a \
deep/a \
deep/deeper1/a \
deep/deeper1/deepest/a
do
ensure_not_expanded blame $file || return 1
blame: enable and test the sparse index Enable the sparse index for the 'git blame' command. The index was already not expanded with this command, so the most interesting thing to do is to add tests that verify that 'git blame' behaves correctly when the sparse index is enabled and that its performance improves. More specifically, these cases are: 1. The index is not expanded for 'blame' when given paths in the sparse checkout cone at multiple levels. 2. Performance measurably improves for 'blame' with sparse index when given paths in the sparse checkout cone at multiple levels. The `p2000` tests demonstrate a ~60% execution time reduction when running 'blame' for a file two levels deep and and a ~30% execution time reduction for a file three levels deep. Test before after ---------------------------------------------------------------- 2000.62: git blame f2/f4/a (full-v3) 0.31 0.32 +3.2% 2000.63: git blame f2/f4/a (full-v4) 0.29 0.31 +6.9% 2000.64: git blame f2/f4/a (sparse-v3) 0.55 0.23 -58.2% 2000.65: git blame f2/f4/a (sparse-v4) 0.57 0.23 -59.6% 2000.66: git blame f2/f4/f3/a (full-v3) 0.77 0.85 +10.4% 2000.67: git blame f2/f4/f3/a (full-v4) 0.78 0.81 +3.8% 2000.68: git blame f2/f4/f3/a (sparse-v3) 1.07 0.72 -32.7% 2000.99: git blame f2/f4/f3/a (sparse-v4) 1.05 0.73 -30.5% We do not include paths outside the sparse checkout cone because blame does not support blaming files that are not present in the working directory. This is true in both sparse and full checkouts. Signed-off-by: Lessley Dennington <lessleydennington@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-06 15:56:01 +00:00
done
'
test_expect_success 'sparse index is not expanded: fetch/pull' '
init_repos &&
git -C sparse-index remote add full "file://$(pwd)/full-checkout" &&
ensure_not_expanded fetch full &&
git -C full-checkout commit --allow-empty -m "for pull merge" &&
git -C sparse-index commit --allow-empty -m "for pull merge" &&
ensure_not_expanded pull full base
'
test_expect_success 'sparse index is not expanded: read-tree' '
init_repos &&
ensure_not_expanded checkout -b test-branch update-folder1 &&
for MERGE_TREES in "base HEAD update-folder2" \
"base HEAD rename-base" \
"base update-folder2" \
read-tree: make two-way merge sparse-aware Enable two-way merge with 'git read-tree' without expanding the sparse index. When in a sparse index, a two-way merge will trivially succeed as long as there are not changes to the same sparse directory in multiple trees (i.e., sparse directory-level "edit-edit" conflicts). If there are such conflicts, the merge will fail despite the possibility that individual files could merge cleanly. In order to resolve these "edit-edit" conflicts, "conflicted" sparse directories are - rather than rejected - merged by traversing their associated trees by OID. For each child of the sparse directory: 1. Files are merged as normal (see Documentation/git-read-tree.txt for details). 2. Subdirectories are treated as sparse directories and merged in 'twoway_merge'. If there are no conflicts, they are merged according to the rules in Documentation/git-read-tree.txt; otherwise, the subdirectory is recursively traversed and merged. This process allows sparse directories to be individually merged at the necessary depth *without* expanding a full index. The 't/t1092-sparse-checkout-compatibility.sh' test 'read-tree --merge with edit/edit conflicts in sparse directories' tests two-way merges with 1) changes inside sparse directories that do not conflict and 2) changes that do conflict (with the correct file(s) reported in the error message). Additionally, add two-way merge cases to 'sparse index is not expanded: read-tree' to confirm that the index is not expanded regardless of whether edit/edit conflicts are present in a sparse directory. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 20:24:30 +00:00
"base rename-base" \
"update-folder2"
do
ensure_not_expanded read-tree -mu $MERGE_TREES &&
ensure_not_expanded reset --hard || return 1
done &&
rm -rf sparse-index/deep/deeper2 &&
ensure_not_expanded add . &&
ensure_not_expanded commit -m "test" &&
ensure_not_expanded read-tree --prefix=deep/deeper2 -u deepest
'
test_expect_success 'ls-files' '
init_repos &&
# Use a smaller sparse-checkout for reduced output
test_sparse_match git sparse-checkout set &&
# Behavior agrees by default. Sparse index is expanded.
test_all_match git ls-files &&
# With --sparse, the sparse index data changes behavior.
git -C sparse-index ls-files --sparse >actual &&
cat >expect <<-\EOF &&
a
before/
deep/
e
folder1-
folder1.x
folder1/
folder10
folder2/
g
x/
z
EOF
test_cmp expect actual &&
# With --sparse and no sparse index, nothing changes.
git -C sparse-checkout ls-files >dense &&
git -C sparse-checkout ls-files --sparse >sparse &&
test_cmp dense sparse &&
# Set up a strange condition of having a file edit
repo_read_index: clear SKIP_WORKTREE bit from files present in worktree The fix is short (~30 lines), but the description is not. Sorry. There is a set of problems caused by files in what I'll refer to as the "present-despite-SKIP_WORKTREE" state. This commit aims to not just fix these problems, but remove the entire class as a possibility -- for those using sparse checkouts. But first, we need to understand the problems this class presents. A quick outline: * Problems * User facing issues * Problem space complexity * Maintenance and code correctness challenges * SKIP_WORKTREE expectations in Git * Suggested solution * Pros/Cons of suggested solution * Notes on testcase modifications === User facing issues === There are various ways for users to get files to be present in the working copy despite having the SKIP_WORKTREE bit set for that file in the index. This may come from: * various git commands not really supporting the SKIP_WORKTREE bit[1,2] * users grabbing files from elsewhere and writing them to the worktree (perhaps even cached in their editor) * users attempting to "abort" a sparse-checkout operation with a not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the working tree is not atomic)[3]. Once users have present-despite-SKIP_WORKTREE files, any modifications users make to these files will be ignored, possibly to users' confusion. Further: * these files will degrade performance for the sparse-index case due to requiring the index to be expanded (see commit 55dfcf9591 ("sparse-checkout: clear tracked sparse dirs", 2021-09-08) for why we try to delete entire directories outside the sparse cone). * these files will not be updated by by standard commands (switch/checkout/pull/merge/rebase will leave them alone unless conflicts happen -- and even then, the conflicted file may be written somewhere else to avoid overwriting the SKIP_WORKTREE file that is present and in the way) * there is nothing in Git that users can use to discover such files (status, diff, grep, etc. all ignore it) * there is no reasonable mechanism to "recover" from such a condition (neither `git sparse-checkout reapply` nor `git reset --hard` will correct it). So, not only are users modifications ignored, but the files get progressively more stale over time. At some point in the future, they may change their sparseness specification or disable sparse-checkouts. At that time, all present-despite-SKIP_WORKTREE files will show up as having lots of modifications because they represent a version from a different branch or commit. These might include user-made local changes from days before, but the only way to tell is to have users look through them all closely. If these users come to others for help, there will be no logs that explain the issue; it's just a mysterious list of changes. Users might adamantly claim (correctly, as it turns out) that they didn't modify these files, while others presume they did. [1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ [2] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/ === Problem space complexity === SKIP_WORKTREE has been part of Git for over a decade. Duy did lots of work on it initially, and several others have since come along and put lots of work into it. Stolee spent most of 2021 on the sparse-index, with lots of bugfixes along the way including to non-sparse-index cases as we are still trying to get sparse checkouts to behave reasonably. Basically every codepath throughout the treat needs to be aware of an additional type of file: tracked-but-not-present. The extra type results in lots of extra testcases and lots of extra code everywhere. But, the sad thing is that we actually have more than one extra type. We have tracked, tracked-but-not-present (SKIP_WORKTREE), and tracked-but-promised-to-not-be-present-but-is-present-anyway (present-despite-SKIP_WORKTREE). Two types is a monumental amount of effort to support, and adding a third feels a bit like insanity[4]. [4] Some examples of which can be seen at https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ === Maintenance and code correctness challenges === Matheus' patches to grep stalled for nearly a year, in part because of complications of how to handle sparse-checkouts appropriately in all cases[5][6] (with trying to sanely figure out how to sanely handle present-despite-SKIP_WORKTREE files being one of the complications). His rm/add follow-ups also took months because of those kinds of issues[7]. The corner cases with things like submodules and SKIP_WORKTREE with the addition of present-despite-SKIP_WORKTREE start becoming really complex[8]. We've had to add ugly logic to merge-ort to attempt to handle present-despite-SKIP_WORKTREE files[9], and basically just been forced to give up in merge-recursive knowing full well that we'll sometimes silently discard user modifications. Despite stash essentially being a merge, it needed extra code (beyond what was in merge-ort and merge-recursive) to manually tweak SKIP_WORKTREE bits in order to avoid a few different bugs that'd result in an early abort with a partial stash application[10]. [5] See https://lore.kernel.org/git/5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br/#t and the dates on the thread; also Matheus and I had several conversations off-list trying to resolve the issues over that time [6] ...it finally kind of got unstuck after https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ [7] See for example https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/#t and quotes like "The core functionality of sparse-checkout has always been only partially implemented", a statement I still believe is true today. [8] https://lore.kernel.org/git/pull.809.git.git.1592356884310.gitgitgadget@gmail.com/ [9] See commit 66b209b86a ("merge-ort: implement CE_SKIP_WORKTREE handling with conflicted entries", 2021-03-20) [10] See commit ba359fd507 ("stash: fix stash application in sparse-checkouts", 2020-12-01) === SKIP_WORKTREE expectations in Git === A couple quotes: * From [11] (before the "sparse-checkout" command existed): If it needs too many special cases, hacks, and conditionals, then it is not worth the complexity---if it is easier to write a correct code by allowing Git to populate working tree files, it is perfectly fine to do so. In a sense, the sparse checkout "feature" itself is a hack by itself, and that is why I think this part should be "best effort" as well. * From the git-sparse-checkout manual (still present today): THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE. [11] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ === Suggested solution === SKIP_WORKTREE was written to allow sparse-checkouts, in particular, as the name of the option implies, to allow the file to NOT be in the worktree but consider it to be unchanged rather than deleted. The suggests a simple solution: present-despite-SKIP_WORKTREE files should not exist, for those using sparse-checkouts. Enforce this at index loading time by checking if core.sparseCheckout is true; if so, check files in the index with the SKIP_WORKTREE bit set to verify that they are absent from the working tree. If they are present, unset the bit (in memory, though any commands that write to the index will record the update). Users can, of course, can get the SKIP_WORKTREE bit back such as by running `git sparse-checkout reapply` (if they have ensured the file is unmodified and doesn't match the specified sparsity patterns). === Pros/Cons of suggested solution === Pros: * Solves the user visible problems reported above, which I've been complaining about for nearly a year but couldn't find a solution to. * Helps prevent slow performance degradation with a sparse-index. * Much easier behavior in sparse-checkouts for users to reason about * Very simple, ~30 lines of code. * Significantly simplifies some ugly testcases, and obviates the need to test an entire class of potential issues. * Reduces code complexity, reasoning, and maintenance. Avoids disagreements about weird corner cases[12]. * It has been reported that some users might be (ab)using SKIP_WORKTREE as a let-me-modify-but-keep-the-file-in-the-worktree mechanism[13, and a few other similar references]. These users know of multiple caveats and shortcomings in doing so; perhaps not surprising given the "SKIP_WORKTREE expecations" section above. However, these users use `git update-index --skip-worktree`, and not `git sparse-checkout` or core.sparseCheckout=true. As such, these users would be unaffected by this change and can continue abusing the system as before. [12] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [13] https://stackoverflow.com/questions/13630849/git-difference-between-assume-unchanged-and-skip-worktree Cons: * When core.sparseCheckout is enabled, this adds a performance cost to reading the index. I'll defer discussion of this cost to a subsequent patch, since I have some optimizations to add. === Notes on testcase modifications === The good: * t1011: Compare to two cases above it ('read-tree will not throw away dirty changes, non-sparse'); since the file is present, it should match the non-sparse case now * t1092: sparse-index & sparse-checkout now match full-worktree behavior in more cases! Yaay for consistency! * t6428, t7012: look at how much simpler the tests become! Merge and stash can just fail early telling the user there's a file in the way, instead of not noticing until it's about to write a file and then have to implement sudden crash avoidance. Hurray for sanity! * t7817: sparse behavior better matches full tree behavior. Hurray for sanity! The confusing: * t3705: These changes were ONLY needed on Windows, but they don't hurt other platforms. Let's discuss each individually: * core.sparseCheckout should be false by default. Nothing in this testcase toggles that until many, many tests later. However, early tests (#5 in particular) were testing `update-index --skip-worktree` behavior in a non-sparse-checkout, but the Windows tests in CI were behaving as if core.sparseCheckout=true had been specified somewhere. I do not have access to a Windows machine. But I just manually did what should have been a no-op and turned the config off. And it fixed the test. * I have no idea why the leftover .gitattributes file from this test was causing failures for test #18 on Windows, but only with these changes of mine. Test #18 was checking for empty stderr, and specifically wanted to know that some error completely unrelated to file endings did not appear. The leftover .gitattributes file thus caused some spurious stderr unrelated to the thing being checked. Since other tests did not intend to test normalization, just proactively remove the .gitattributes file. I'm certain this is cleaner and better, I'm just unsure why/how this didn't trigger problems before. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-14 15:59:41 +00:00
# outside of the sparse-checkout cone. We want to verify
# that all modes handle this the same, and detect the
# modification.
write_script edit-content <<-\EOF &&
repo_read_index: clear SKIP_WORKTREE bit from files present in worktree The fix is short (~30 lines), but the description is not. Sorry. There is a set of problems caused by files in what I'll refer to as the "present-despite-SKIP_WORKTREE" state. This commit aims to not just fix these problems, but remove the entire class as a possibility -- for those using sparse checkouts. But first, we need to understand the problems this class presents. A quick outline: * Problems * User facing issues * Problem space complexity * Maintenance and code correctness challenges * SKIP_WORKTREE expectations in Git * Suggested solution * Pros/Cons of suggested solution * Notes on testcase modifications === User facing issues === There are various ways for users to get files to be present in the working copy despite having the SKIP_WORKTREE bit set for that file in the index. This may come from: * various git commands not really supporting the SKIP_WORKTREE bit[1,2] * users grabbing files from elsewhere and writing them to the worktree (perhaps even cached in their editor) * users attempting to "abort" a sparse-checkout operation with a not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the working tree is not atomic)[3]. Once users have present-despite-SKIP_WORKTREE files, any modifications users make to these files will be ignored, possibly to users' confusion. Further: * these files will degrade performance for the sparse-index case due to requiring the index to be expanded (see commit 55dfcf9591 ("sparse-checkout: clear tracked sparse dirs", 2021-09-08) for why we try to delete entire directories outside the sparse cone). * these files will not be updated by by standard commands (switch/checkout/pull/merge/rebase will leave them alone unless conflicts happen -- and even then, the conflicted file may be written somewhere else to avoid overwriting the SKIP_WORKTREE file that is present and in the way) * there is nothing in Git that users can use to discover such files (status, diff, grep, etc. all ignore it) * there is no reasonable mechanism to "recover" from such a condition (neither `git sparse-checkout reapply` nor `git reset --hard` will correct it). So, not only are users modifications ignored, but the files get progressively more stale over time. At some point in the future, they may change their sparseness specification or disable sparse-checkouts. At that time, all present-despite-SKIP_WORKTREE files will show up as having lots of modifications because they represent a version from a different branch or commit. These might include user-made local changes from days before, but the only way to tell is to have users look through them all closely. If these users come to others for help, there will be no logs that explain the issue; it's just a mysterious list of changes. Users might adamantly claim (correctly, as it turns out) that they didn't modify these files, while others presume they did. [1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ [2] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/ === Problem space complexity === SKIP_WORKTREE has been part of Git for over a decade. Duy did lots of work on it initially, and several others have since come along and put lots of work into it. Stolee spent most of 2021 on the sparse-index, with lots of bugfixes along the way including to non-sparse-index cases as we are still trying to get sparse checkouts to behave reasonably. Basically every codepath throughout the treat needs to be aware of an additional type of file: tracked-but-not-present. The extra type results in lots of extra testcases and lots of extra code everywhere. But, the sad thing is that we actually have more than one extra type. We have tracked, tracked-but-not-present (SKIP_WORKTREE), and tracked-but-promised-to-not-be-present-but-is-present-anyway (present-despite-SKIP_WORKTREE). Two types is a monumental amount of effort to support, and adding a third feels a bit like insanity[4]. [4] Some examples of which can be seen at https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ === Maintenance and code correctness challenges === Matheus' patches to grep stalled for nearly a year, in part because of complications of how to handle sparse-checkouts appropriately in all cases[5][6] (with trying to sanely figure out how to sanely handle present-despite-SKIP_WORKTREE files being one of the complications). His rm/add follow-ups also took months because of those kinds of issues[7]. The corner cases with things like submodules and SKIP_WORKTREE with the addition of present-despite-SKIP_WORKTREE start becoming really complex[8]. We've had to add ugly logic to merge-ort to attempt to handle present-despite-SKIP_WORKTREE files[9], and basically just been forced to give up in merge-recursive knowing full well that we'll sometimes silently discard user modifications. Despite stash essentially being a merge, it needed extra code (beyond what was in merge-ort and merge-recursive) to manually tweak SKIP_WORKTREE bits in order to avoid a few different bugs that'd result in an early abort with a partial stash application[10]. [5] See https://lore.kernel.org/git/5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br/#t and the dates on the thread; also Matheus and I had several conversations off-list trying to resolve the issues over that time [6] ...it finally kind of got unstuck after https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ [7] See for example https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/#t and quotes like "The core functionality of sparse-checkout has always been only partially implemented", a statement I still believe is true today. [8] https://lore.kernel.org/git/pull.809.git.git.1592356884310.gitgitgadget@gmail.com/ [9] See commit 66b209b86a ("merge-ort: implement CE_SKIP_WORKTREE handling with conflicted entries", 2021-03-20) [10] See commit ba359fd507 ("stash: fix stash application in sparse-checkouts", 2020-12-01) === SKIP_WORKTREE expectations in Git === A couple quotes: * From [11] (before the "sparse-checkout" command existed): If it needs too many special cases, hacks, and conditionals, then it is not worth the complexity---if it is easier to write a correct code by allowing Git to populate working tree files, it is perfectly fine to do so. In a sense, the sparse checkout "feature" itself is a hack by itself, and that is why I think this part should be "best effort" as well. * From the git-sparse-checkout manual (still present today): THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE. [11] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ === Suggested solution === SKIP_WORKTREE was written to allow sparse-checkouts, in particular, as the name of the option implies, to allow the file to NOT be in the worktree but consider it to be unchanged rather than deleted. The suggests a simple solution: present-despite-SKIP_WORKTREE files should not exist, for those using sparse-checkouts. Enforce this at index loading time by checking if core.sparseCheckout is true; if so, check files in the index with the SKIP_WORKTREE bit set to verify that they are absent from the working tree. If they are present, unset the bit (in memory, though any commands that write to the index will record the update). Users can, of course, can get the SKIP_WORKTREE bit back such as by running `git sparse-checkout reapply` (if they have ensured the file is unmodified and doesn't match the specified sparsity patterns). === Pros/Cons of suggested solution === Pros: * Solves the user visible problems reported above, which I've been complaining about for nearly a year but couldn't find a solution to. * Helps prevent slow performance degradation with a sparse-index. * Much easier behavior in sparse-checkouts for users to reason about * Very simple, ~30 lines of code. * Significantly simplifies some ugly testcases, and obviates the need to test an entire class of potential issues. * Reduces code complexity, reasoning, and maintenance. Avoids disagreements about weird corner cases[12]. * It has been reported that some users might be (ab)using SKIP_WORKTREE as a let-me-modify-but-keep-the-file-in-the-worktree mechanism[13, and a few other similar references]. These users know of multiple caveats and shortcomings in doing so; perhaps not surprising given the "SKIP_WORKTREE expecations" section above. However, these users use `git update-index --skip-worktree`, and not `git sparse-checkout` or core.sparseCheckout=true. As such, these users would be unaffected by this change and can continue abusing the system as before. [12] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [13] https://stackoverflow.com/questions/13630849/git-difference-between-assume-unchanged-and-skip-worktree Cons: * When core.sparseCheckout is enabled, this adds a performance cost to reading the index. I'll defer discussion of this cost to a subsequent patch, since I have some optimizations to add. === Notes on testcase modifications === The good: * t1011: Compare to two cases above it ('read-tree will not throw away dirty changes, non-sparse'); since the file is present, it should match the non-sparse case now * t1092: sparse-index & sparse-checkout now match full-worktree behavior in more cases! Yaay for consistency! * t6428, t7012: look at how much simpler the tests become! Merge and stash can just fail early telling the user there's a file in the way, instead of not noticing until it's about to write a file and then have to implement sudden crash avoidance. Hurray for sanity! * t7817: sparse behavior better matches full tree behavior. Hurray for sanity! The confusing: * t3705: These changes were ONLY needed on Windows, but they don't hurt other platforms. Let's discuss each individually: * core.sparseCheckout should be false by default. Nothing in this testcase toggles that until many, many tests later. However, early tests (#5 in particular) were testing `update-index --skip-worktree` behavior in a non-sparse-checkout, but the Windows tests in CI were behaving as if core.sparseCheckout=true had been specified somewhere. I do not have access to a Windows machine. But I just manually did what should have been a no-op and turned the config off. And it fixed the test. * I have no idea why the leftover .gitattributes file from this test was causing failures for test #18 on Windows, but only with these changes of mine. Test #18 was checking for empty stderr, and specifically wanted to know that some error completely unrelated to file endings did not appear. The leftover .gitattributes file thus caused some spurious stderr unrelated to the thing being checked. Since other tests did not intend to test normalization, just proactively remove the .gitattributes file. I'm certain this is cleaner and better, I'm just unsure why/how this didn't trigger problems before. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-14 15:59:41 +00:00
mkdir -p folder1 &&
echo content >>folder1/a
EOF
repo_read_index: clear SKIP_WORKTREE bit from files present in worktree The fix is short (~30 lines), but the description is not. Sorry. There is a set of problems caused by files in what I'll refer to as the "present-despite-SKIP_WORKTREE" state. This commit aims to not just fix these problems, but remove the entire class as a possibility -- for those using sparse checkouts. But first, we need to understand the problems this class presents. A quick outline: * Problems * User facing issues * Problem space complexity * Maintenance and code correctness challenges * SKIP_WORKTREE expectations in Git * Suggested solution * Pros/Cons of suggested solution * Notes on testcase modifications === User facing issues === There are various ways for users to get files to be present in the working copy despite having the SKIP_WORKTREE bit set for that file in the index. This may come from: * various git commands not really supporting the SKIP_WORKTREE bit[1,2] * users grabbing files from elsewhere and writing them to the worktree (perhaps even cached in their editor) * users attempting to "abort" a sparse-checkout operation with a not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the working tree is not atomic)[3]. Once users have present-despite-SKIP_WORKTREE files, any modifications users make to these files will be ignored, possibly to users' confusion. Further: * these files will degrade performance for the sparse-index case due to requiring the index to be expanded (see commit 55dfcf9591 ("sparse-checkout: clear tracked sparse dirs", 2021-09-08) for why we try to delete entire directories outside the sparse cone). * these files will not be updated by by standard commands (switch/checkout/pull/merge/rebase will leave them alone unless conflicts happen -- and even then, the conflicted file may be written somewhere else to avoid overwriting the SKIP_WORKTREE file that is present and in the way) * there is nothing in Git that users can use to discover such files (status, diff, grep, etc. all ignore it) * there is no reasonable mechanism to "recover" from such a condition (neither `git sparse-checkout reapply` nor `git reset --hard` will correct it). So, not only are users modifications ignored, but the files get progressively more stale over time. At some point in the future, they may change their sparseness specification or disable sparse-checkouts. At that time, all present-despite-SKIP_WORKTREE files will show up as having lots of modifications because they represent a version from a different branch or commit. These might include user-made local changes from days before, but the only way to tell is to have users look through them all closely. If these users come to others for help, there will be no logs that explain the issue; it's just a mysterious list of changes. Users might adamantly claim (correctly, as it turns out) that they didn't modify these files, while others presume they did. [1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ [2] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/ === Problem space complexity === SKIP_WORKTREE has been part of Git for over a decade. Duy did lots of work on it initially, and several others have since come along and put lots of work into it. Stolee spent most of 2021 on the sparse-index, with lots of bugfixes along the way including to non-sparse-index cases as we are still trying to get sparse checkouts to behave reasonably. Basically every codepath throughout the treat needs to be aware of an additional type of file: tracked-but-not-present. The extra type results in lots of extra testcases and lots of extra code everywhere. But, the sad thing is that we actually have more than one extra type. We have tracked, tracked-but-not-present (SKIP_WORKTREE), and tracked-but-promised-to-not-be-present-but-is-present-anyway (present-despite-SKIP_WORKTREE). Two types is a monumental amount of effort to support, and adding a third feels a bit like insanity[4]. [4] Some examples of which can be seen at https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ === Maintenance and code correctness challenges === Matheus' patches to grep stalled for nearly a year, in part because of complications of how to handle sparse-checkouts appropriately in all cases[5][6] (with trying to sanely figure out how to sanely handle present-despite-SKIP_WORKTREE files being one of the complications). His rm/add follow-ups also took months because of those kinds of issues[7]. The corner cases with things like submodules and SKIP_WORKTREE with the addition of present-despite-SKIP_WORKTREE start becoming really complex[8]. We've had to add ugly logic to merge-ort to attempt to handle present-despite-SKIP_WORKTREE files[9], and basically just been forced to give up in merge-recursive knowing full well that we'll sometimes silently discard user modifications. Despite stash essentially being a merge, it needed extra code (beyond what was in merge-ort and merge-recursive) to manually tweak SKIP_WORKTREE bits in order to avoid a few different bugs that'd result in an early abort with a partial stash application[10]. [5] See https://lore.kernel.org/git/5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br/#t and the dates on the thread; also Matheus and I had several conversations off-list trying to resolve the issues over that time [6] ...it finally kind of got unstuck after https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ [7] See for example https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/#t and quotes like "The core functionality of sparse-checkout has always been only partially implemented", a statement I still believe is true today. [8] https://lore.kernel.org/git/pull.809.git.git.1592356884310.gitgitgadget@gmail.com/ [9] See commit 66b209b86a ("merge-ort: implement CE_SKIP_WORKTREE handling with conflicted entries", 2021-03-20) [10] See commit ba359fd507 ("stash: fix stash application in sparse-checkouts", 2020-12-01) === SKIP_WORKTREE expectations in Git === A couple quotes: * From [11] (before the "sparse-checkout" command existed): If it needs too many special cases, hacks, and conditionals, then it is not worth the complexity---if it is easier to write a correct code by allowing Git to populate working tree files, it is perfectly fine to do so. In a sense, the sparse checkout "feature" itself is a hack by itself, and that is why I think this part should be "best effort" as well. * From the git-sparse-checkout manual (still present today): THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE. [11] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ === Suggested solution === SKIP_WORKTREE was written to allow sparse-checkouts, in particular, as the name of the option implies, to allow the file to NOT be in the worktree but consider it to be unchanged rather than deleted. The suggests a simple solution: present-despite-SKIP_WORKTREE files should not exist, for those using sparse-checkouts. Enforce this at index loading time by checking if core.sparseCheckout is true; if so, check files in the index with the SKIP_WORKTREE bit set to verify that they are absent from the working tree. If they are present, unset the bit (in memory, though any commands that write to the index will record the update). Users can, of course, can get the SKIP_WORKTREE bit back such as by running `git sparse-checkout reapply` (if they have ensured the file is unmodified and doesn't match the specified sparsity patterns). === Pros/Cons of suggested solution === Pros: * Solves the user visible problems reported above, which I've been complaining about for nearly a year but couldn't find a solution to. * Helps prevent slow performance degradation with a sparse-index. * Much easier behavior in sparse-checkouts for users to reason about * Very simple, ~30 lines of code. * Significantly simplifies some ugly testcases, and obviates the need to test an entire class of potential issues. * Reduces code complexity, reasoning, and maintenance. Avoids disagreements about weird corner cases[12]. * It has been reported that some users might be (ab)using SKIP_WORKTREE as a let-me-modify-but-keep-the-file-in-the-worktree mechanism[13, and a few other similar references]. These users know of multiple caveats and shortcomings in doing so; perhaps not surprising given the "SKIP_WORKTREE expecations" section above. However, these users use `git update-index --skip-worktree`, and not `git sparse-checkout` or core.sparseCheckout=true. As such, these users would be unaffected by this change and can continue abusing the system as before. [12] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [13] https://stackoverflow.com/questions/13630849/git-difference-between-assume-unchanged-and-skip-worktree Cons: * When core.sparseCheckout is enabled, this adds a performance cost to reading the index. I'll defer discussion of this cost to a subsequent patch, since I have some optimizations to add. === Notes on testcase modifications === The good: * t1011: Compare to two cases above it ('read-tree will not throw away dirty changes, non-sparse'); since the file is present, it should match the non-sparse case now * t1092: sparse-index & sparse-checkout now match full-worktree behavior in more cases! Yaay for consistency! * t6428, t7012: look at how much simpler the tests become! Merge and stash can just fail early telling the user there's a file in the way, instead of not noticing until it's about to write a file and then have to implement sudden crash avoidance. Hurray for sanity! * t7817: sparse behavior better matches full tree behavior. Hurray for sanity! The confusing: * t3705: These changes were ONLY needed on Windows, but they don't hurt other platforms. Let's discuss each individually: * core.sparseCheckout should be false by default. Nothing in this testcase toggles that until many, many tests later. However, early tests (#5 in particular) were testing `update-index --skip-worktree` behavior in a non-sparse-checkout, but the Windows tests in CI were behaving as if core.sparseCheckout=true had been specified somewhere. I do not have access to a Windows machine. But I just manually did what should have been a no-op and turned the config off. And it fixed the test. * I have no idea why the leftover .gitattributes file from this test was causing failures for test #18 on Windows, but only with these changes of mine. Test #18 was checking for empty stderr, and specifically wanted to know that some error completely unrelated to file endings did not appear. The leftover .gitattributes file thus caused some spurious stderr unrelated to the thing being checked. Since other tests did not intend to test normalization, just proactively remove the .gitattributes file. I'm certain this is cleaner and better, I'm just unsure why/how this didn't trigger problems before. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-14 15:59:41 +00:00
run_on_all ../edit-content &&
repo_read_index: clear SKIP_WORKTREE bit from files present in worktree The fix is short (~30 lines), but the description is not. Sorry. There is a set of problems caused by files in what I'll refer to as the "present-despite-SKIP_WORKTREE" state. This commit aims to not just fix these problems, but remove the entire class as a possibility -- for those using sparse checkouts. But first, we need to understand the problems this class presents. A quick outline: * Problems * User facing issues * Problem space complexity * Maintenance and code correctness challenges * SKIP_WORKTREE expectations in Git * Suggested solution * Pros/Cons of suggested solution * Notes on testcase modifications === User facing issues === There are various ways for users to get files to be present in the working copy despite having the SKIP_WORKTREE bit set for that file in the index. This may come from: * various git commands not really supporting the SKIP_WORKTREE bit[1,2] * users grabbing files from elsewhere and writing them to the worktree (perhaps even cached in their editor) * users attempting to "abort" a sparse-checkout operation with a not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the working tree is not atomic)[3]. Once users have present-despite-SKIP_WORKTREE files, any modifications users make to these files will be ignored, possibly to users' confusion. Further: * these files will degrade performance for the sparse-index case due to requiring the index to be expanded (see commit 55dfcf9591 ("sparse-checkout: clear tracked sparse dirs", 2021-09-08) for why we try to delete entire directories outside the sparse cone). * these files will not be updated by by standard commands (switch/checkout/pull/merge/rebase will leave them alone unless conflicts happen -- and even then, the conflicted file may be written somewhere else to avoid overwriting the SKIP_WORKTREE file that is present and in the way) * there is nothing in Git that users can use to discover such files (status, diff, grep, etc. all ignore it) * there is no reasonable mechanism to "recover" from such a condition (neither `git sparse-checkout reapply` nor `git reset --hard` will correct it). So, not only are users modifications ignored, but the files get progressively more stale over time. At some point in the future, they may change their sparseness specification or disable sparse-checkouts. At that time, all present-despite-SKIP_WORKTREE files will show up as having lots of modifications because they represent a version from a different branch or commit. These might include user-made local changes from days before, but the only way to tell is to have users look through them all closely. If these users come to others for help, there will be no logs that explain the issue; it's just a mysterious list of changes. Users might adamantly claim (correctly, as it turns out) that they didn't modify these files, while others presume they did. [1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ [2] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/ === Problem space complexity === SKIP_WORKTREE has been part of Git for over a decade. Duy did lots of work on it initially, and several others have since come along and put lots of work into it. Stolee spent most of 2021 on the sparse-index, with lots of bugfixes along the way including to non-sparse-index cases as we are still trying to get sparse checkouts to behave reasonably. Basically every codepath throughout the treat needs to be aware of an additional type of file: tracked-but-not-present. The extra type results in lots of extra testcases and lots of extra code everywhere. But, the sad thing is that we actually have more than one extra type. We have tracked, tracked-but-not-present (SKIP_WORKTREE), and tracked-but-promised-to-not-be-present-but-is-present-anyway (present-despite-SKIP_WORKTREE). Two types is a monumental amount of effort to support, and adding a third feels a bit like insanity[4]. [4] Some examples of which can be seen at https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ === Maintenance and code correctness challenges === Matheus' patches to grep stalled for nearly a year, in part because of complications of how to handle sparse-checkouts appropriately in all cases[5][6] (with trying to sanely figure out how to sanely handle present-despite-SKIP_WORKTREE files being one of the complications). His rm/add follow-ups also took months because of those kinds of issues[7]. The corner cases with things like submodules and SKIP_WORKTREE with the addition of present-despite-SKIP_WORKTREE start becoming really complex[8]. We've had to add ugly logic to merge-ort to attempt to handle present-despite-SKIP_WORKTREE files[9], and basically just been forced to give up in merge-recursive knowing full well that we'll sometimes silently discard user modifications. Despite stash essentially being a merge, it needed extra code (beyond what was in merge-ort and merge-recursive) to manually tweak SKIP_WORKTREE bits in order to avoid a few different bugs that'd result in an early abort with a partial stash application[10]. [5] See https://lore.kernel.org/git/5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br/#t and the dates on the thread; also Matheus and I had several conversations off-list trying to resolve the issues over that time [6] ...it finally kind of got unstuck after https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ [7] See for example https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/#t and quotes like "The core functionality of sparse-checkout has always been only partially implemented", a statement I still believe is true today. [8] https://lore.kernel.org/git/pull.809.git.git.1592356884310.gitgitgadget@gmail.com/ [9] See commit 66b209b86a ("merge-ort: implement CE_SKIP_WORKTREE handling with conflicted entries", 2021-03-20) [10] See commit ba359fd507 ("stash: fix stash application in sparse-checkouts", 2020-12-01) === SKIP_WORKTREE expectations in Git === A couple quotes: * From [11] (before the "sparse-checkout" command existed): If it needs too many special cases, hacks, and conditionals, then it is not worth the complexity---if it is easier to write a correct code by allowing Git to populate working tree files, it is perfectly fine to do so. In a sense, the sparse checkout "feature" itself is a hack by itself, and that is why I think this part should be "best effort" as well. * From the git-sparse-checkout manual (still present today): THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE. [11] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ === Suggested solution === SKIP_WORKTREE was written to allow sparse-checkouts, in particular, as the name of the option implies, to allow the file to NOT be in the worktree but consider it to be unchanged rather than deleted. The suggests a simple solution: present-despite-SKIP_WORKTREE files should not exist, for those using sparse-checkouts. Enforce this at index loading time by checking if core.sparseCheckout is true; if so, check files in the index with the SKIP_WORKTREE bit set to verify that they are absent from the working tree. If they are present, unset the bit (in memory, though any commands that write to the index will record the update). Users can, of course, can get the SKIP_WORKTREE bit back such as by running `git sparse-checkout reapply` (if they have ensured the file is unmodified and doesn't match the specified sparsity patterns). === Pros/Cons of suggested solution === Pros: * Solves the user visible problems reported above, which I've been complaining about for nearly a year but couldn't find a solution to. * Helps prevent slow performance degradation with a sparse-index. * Much easier behavior in sparse-checkouts for users to reason about * Very simple, ~30 lines of code. * Significantly simplifies some ugly testcases, and obviates the need to test an entire class of potential issues. * Reduces code complexity, reasoning, and maintenance. Avoids disagreements about weird corner cases[12]. * It has been reported that some users might be (ab)using SKIP_WORKTREE as a let-me-modify-but-keep-the-file-in-the-worktree mechanism[13, and a few other similar references]. These users know of multiple caveats and shortcomings in doing so; perhaps not surprising given the "SKIP_WORKTREE expecations" section above. However, these users use `git update-index --skip-worktree`, and not `git sparse-checkout` or core.sparseCheckout=true. As such, these users would be unaffected by this change and can continue abusing the system as before. [12] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [13] https://stackoverflow.com/questions/13630849/git-difference-between-assume-unchanged-and-skip-worktree Cons: * When core.sparseCheckout is enabled, this adds a performance cost to reading the index. I'll defer discussion of this cost to a subsequent patch, since I have some optimizations to add. === Notes on testcase modifications === The good: * t1011: Compare to two cases above it ('read-tree will not throw away dirty changes, non-sparse'); since the file is present, it should match the non-sparse case now * t1092: sparse-index & sparse-checkout now match full-worktree behavior in more cases! Yaay for consistency! * t6428, t7012: look at how much simpler the tests become! Merge and stash can just fail early telling the user there's a file in the way, instead of not noticing until it's about to write a file and then have to implement sudden crash avoidance. Hurray for sanity! * t7817: sparse behavior better matches full tree behavior. Hurray for sanity! The confusing: * t3705: These changes were ONLY needed on Windows, but they don't hurt other platforms. Let's discuss each individually: * core.sparseCheckout should be false by default. Nothing in this testcase toggles that until many, many tests later. However, early tests (#5 in particular) were testing `update-index --skip-worktree` behavior in a non-sparse-checkout, but the Windows tests in CI were behaving as if core.sparseCheckout=true had been specified somewhere. I do not have access to a Windows machine. But I just manually did what should have been a no-op and turned the config off. And it fixed the test. * I have no idea why the leftover .gitattributes file from this test was causing failures for test #18 on Windows, but only with these changes of mine. Test #18 was checking for empty stderr, and specifically wanted to know that some error completely unrelated to file endings did not appear. The leftover .gitattributes file thus caused some spurious stderr unrelated to the thing being checked. Since other tests did not intend to test normalization, just proactively remove the .gitattributes file. I'm certain this is cleaner and better, I'm just unsure why/how this didn't trigger problems before. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-14 15:59:41 +00:00
test_all_match git ls-files --modified &&
git -C sparse-index ls-files --sparse --modified >sparse-index-out &&
repo_read_index: clear SKIP_WORKTREE bit from files present in worktree The fix is short (~30 lines), but the description is not. Sorry. There is a set of problems caused by files in what I'll refer to as the "present-despite-SKIP_WORKTREE" state. This commit aims to not just fix these problems, but remove the entire class as a possibility -- for those using sparse checkouts. But first, we need to understand the problems this class presents. A quick outline: * Problems * User facing issues * Problem space complexity * Maintenance and code correctness challenges * SKIP_WORKTREE expectations in Git * Suggested solution * Pros/Cons of suggested solution * Notes on testcase modifications === User facing issues === There are various ways for users to get files to be present in the working copy despite having the SKIP_WORKTREE bit set for that file in the index. This may come from: * various git commands not really supporting the SKIP_WORKTREE bit[1,2] * users grabbing files from elsewhere and writing them to the worktree (perhaps even cached in their editor) * users attempting to "abort" a sparse-checkout operation with a not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the working tree is not atomic)[3]. Once users have present-despite-SKIP_WORKTREE files, any modifications users make to these files will be ignored, possibly to users' confusion. Further: * these files will degrade performance for the sparse-index case due to requiring the index to be expanded (see commit 55dfcf9591 ("sparse-checkout: clear tracked sparse dirs", 2021-09-08) for why we try to delete entire directories outside the sparse cone). * these files will not be updated by by standard commands (switch/checkout/pull/merge/rebase will leave them alone unless conflicts happen -- and even then, the conflicted file may be written somewhere else to avoid overwriting the SKIP_WORKTREE file that is present and in the way) * there is nothing in Git that users can use to discover such files (status, diff, grep, etc. all ignore it) * there is no reasonable mechanism to "recover" from such a condition (neither `git sparse-checkout reapply` nor `git reset --hard` will correct it). So, not only are users modifications ignored, but the files get progressively more stale over time. At some point in the future, they may change their sparseness specification or disable sparse-checkouts. At that time, all present-despite-SKIP_WORKTREE files will show up as having lots of modifications because they represent a version from a different branch or commit. These might include user-made local changes from days before, but the only way to tell is to have users look through them all closely. If these users come to others for help, there will be no logs that explain the issue; it's just a mysterious list of changes. Users might adamantly claim (correctly, as it turns out) that they didn't modify these files, while others presume they did. [1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ [2] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/ === Problem space complexity === SKIP_WORKTREE has been part of Git for over a decade. Duy did lots of work on it initially, and several others have since come along and put lots of work into it. Stolee spent most of 2021 on the sparse-index, with lots of bugfixes along the way including to non-sparse-index cases as we are still trying to get sparse checkouts to behave reasonably. Basically every codepath throughout the treat needs to be aware of an additional type of file: tracked-but-not-present. The extra type results in lots of extra testcases and lots of extra code everywhere. But, the sad thing is that we actually have more than one extra type. We have tracked, tracked-but-not-present (SKIP_WORKTREE), and tracked-but-promised-to-not-be-present-but-is-present-anyway (present-despite-SKIP_WORKTREE). Two types is a monumental amount of effort to support, and adding a third feels a bit like insanity[4]. [4] Some examples of which can be seen at https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ === Maintenance and code correctness challenges === Matheus' patches to grep stalled for nearly a year, in part because of complications of how to handle sparse-checkouts appropriately in all cases[5][6] (with trying to sanely figure out how to sanely handle present-despite-SKIP_WORKTREE files being one of the complications). His rm/add follow-ups also took months because of those kinds of issues[7]. The corner cases with things like submodules and SKIP_WORKTREE with the addition of present-despite-SKIP_WORKTREE start becoming really complex[8]. We've had to add ugly logic to merge-ort to attempt to handle present-despite-SKIP_WORKTREE files[9], and basically just been forced to give up in merge-recursive knowing full well that we'll sometimes silently discard user modifications. Despite stash essentially being a merge, it needed extra code (beyond what was in merge-ort and merge-recursive) to manually tweak SKIP_WORKTREE bits in order to avoid a few different bugs that'd result in an early abort with a partial stash application[10]. [5] See https://lore.kernel.org/git/5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br/#t and the dates on the thread; also Matheus and I had several conversations off-list trying to resolve the issues over that time [6] ...it finally kind of got unstuck after https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ [7] See for example https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/#t and quotes like "The core functionality of sparse-checkout has always been only partially implemented", a statement I still believe is true today. [8] https://lore.kernel.org/git/pull.809.git.git.1592356884310.gitgitgadget@gmail.com/ [9] See commit 66b209b86a ("merge-ort: implement CE_SKIP_WORKTREE handling with conflicted entries", 2021-03-20) [10] See commit ba359fd507 ("stash: fix stash application in sparse-checkouts", 2020-12-01) === SKIP_WORKTREE expectations in Git === A couple quotes: * From [11] (before the "sparse-checkout" command existed): If it needs too many special cases, hacks, and conditionals, then it is not worth the complexity---if it is easier to write a correct code by allowing Git to populate working tree files, it is perfectly fine to do so. In a sense, the sparse checkout "feature" itself is a hack by itself, and that is why I think this part should be "best effort" as well. * From the git-sparse-checkout manual (still present today): THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE. [11] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ === Suggested solution === SKIP_WORKTREE was written to allow sparse-checkouts, in particular, as the name of the option implies, to allow the file to NOT be in the worktree but consider it to be unchanged rather than deleted. The suggests a simple solution: present-despite-SKIP_WORKTREE files should not exist, for those using sparse-checkouts. Enforce this at index loading time by checking if core.sparseCheckout is true; if so, check files in the index with the SKIP_WORKTREE bit set to verify that they are absent from the working tree. If they are present, unset the bit (in memory, though any commands that write to the index will record the update). Users can, of course, can get the SKIP_WORKTREE bit back such as by running `git sparse-checkout reapply` (if they have ensured the file is unmodified and doesn't match the specified sparsity patterns). === Pros/Cons of suggested solution === Pros: * Solves the user visible problems reported above, which I've been complaining about for nearly a year but couldn't find a solution to. * Helps prevent slow performance degradation with a sparse-index. * Much easier behavior in sparse-checkouts for users to reason about * Very simple, ~30 lines of code. * Significantly simplifies some ugly testcases, and obviates the need to test an entire class of potential issues. * Reduces code complexity, reasoning, and maintenance. Avoids disagreements about weird corner cases[12]. * It has been reported that some users might be (ab)using SKIP_WORKTREE as a let-me-modify-but-keep-the-file-in-the-worktree mechanism[13, and a few other similar references]. These users know of multiple caveats and shortcomings in doing so; perhaps not surprising given the "SKIP_WORKTREE expecations" section above. However, these users use `git update-index --skip-worktree`, and not `git sparse-checkout` or core.sparseCheckout=true. As such, these users would be unaffected by this change and can continue abusing the system as before. [12] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [13] https://stackoverflow.com/questions/13630849/git-difference-between-assume-unchanged-and-skip-worktree Cons: * When core.sparseCheckout is enabled, this adds a performance cost to reading the index. I'll defer discussion of this cost to a subsequent patch, since I have some optimizations to add. === Notes on testcase modifications === The good: * t1011: Compare to two cases above it ('read-tree will not throw away dirty changes, non-sparse'); since the file is present, it should match the non-sparse case now * t1092: sparse-index & sparse-checkout now match full-worktree behavior in more cases! Yaay for consistency! * t6428, t7012: look at how much simpler the tests become! Merge and stash can just fail early telling the user there's a file in the way, instead of not noticing until it's about to write a file and then have to implement sudden crash avoidance. Hurray for sanity! * t7817: sparse behavior better matches full tree behavior. Hurray for sanity! The confusing: * t3705: These changes were ONLY needed on Windows, but they don't hurt other platforms. Let's discuss each individually: * core.sparseCheckout should be false by default. Nothing in this testcase toggles that until many, many tests later. However, early tests (#5 in particular) were testing `update-index --skip-worktree` behavior in a non-sparse-checkout, but the Windows tests in CI were behaving as if core.sparseCheckout=true had been specified somewhere. I do not have access to a Windows machine. But I just manually did what should have been a no-op and turned the config off. And it fixed the test. * I have no idea why the leftover .gitattributes file from this test was causing failures for test #18 on Windows, but only with these changes of mine. Test #18 was checking for empty stderr, and specifically wanted to know that some error completely unrelated to file endings did not appear. The leftover .gitattributes file thus caused some spurious stderr unrelated to the thing being checked. Since other tests did not intend to test normalization, just proactively remove the .gitattributes file. I'm certain this is cleaner and better, I'm just unsure why/how this didn't trigger problems before. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-14 15:59:41 +00:00
cat >expect <<-\EOF &&
folder1/a
EOF
test_cmp expect sparse-index-out &&
# Add folder1 to the sparse-checkout cone and
# check that ls-files shows the expanded files.
test_sparse_match git sparse-checkout add folder1 &&
repo_read_index: clear SKIP_WORKTREE bit from files present in worktree The fix is short (~30 lines), but the description is not. Sorry. There is a set of problems caused by files in what I'll refer to as the "present-despite-SKIP_WORKTREE" state. This commit aims to not just fix these problems, but remove the entire class as a possibility -- for those using sparse checkouts. But first, we need to understand the problems this class presents. A quick outline: * Problems * User facing issues * Problem space complexity * Maintenance and code correctness challenges * SKIP_WORKTREE expectations in Git * Suggested solution * Pros/Cons of suggested solution * Notes on testcase modifications === User facing issues === There are various ways for users to get files to be present in the working copy despite having the SKIP_WORKTREE bit set for that file in the index. This may come from: * various git commands not really supporting the SKIP_WORKTREE bit[1,2] * users grabbing files from elsewhere and writing them to the worktree (perhaps even cached in their editor) * users attempting to "abort" a sparse-checkout operation with a not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the working tree is not atomic)[3]. Once users have present-despite-SKIP_WORKTREE files, any modifications users make to these files will be ignored, possibly to users' confusion. Further: * these files will degrade performance for the sparse-index case due to requiring the index to be expanded (see commit 55dfcf9591 ("sparse-checkout: clear tracked sparse dirs", 2021-09-08) for why we try to delete entire directories outside the sparse cone). * these files will not be updated by by standard commands (switch/checkout/pull/merge/rebase will leave them alone unless conflicts happen -- and even then, the conflicted file may be written somewhere else to avoid overwriting the SKIP_WORKTREE file that is present and in the way) * there is nothing in Git that users can use to discover such files (status, diff, grep, etc. all ignore it) * there is no reasonable mechanism to "recover" from such a condition (neither `git sparse-checkout reapply` nor `git reset --hard` will correct it). So, not only are users modifications ignored, but the files get progressively more stale over time. At some point in the future, they may change their sparseness specification or disable sparse-checkouts. At that time, all present-despite-SKIP_WORKTREE files will show up as having lots of modifications because they represent a version from a different branch or commit. These might include user-made local changes from days before, but the only way to tell is to have users look through them all closely. If these users come to others for help, there will be no logs that explain the issue; it's just a mysterious list of changes. Users might adamantly claim (correctly, as it turns out) that they didn't modify these files, while others presume they did. [1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ [2] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/ === Problem space complexity === SKIP_WORKTREE has been part of Git for over a decade. Duy did lots of work on it initially, and several others have since come along and put lots of work into it. Stolee spent most of 2021 on the sparse-index, with lots of bugfixes along the way including to non-sparse-index cases as we are still trying to get sparse checkouts to behave reasonably. Basically every codepath throughout the treat needs to be aware of an additional type of file: tracked-but-not-present. The extra type results in lots of extra testcases and lots of extra code everywhere. But, the sad thing is that we actually have more than one extra type. We have tracked, tracked-but-not-present (SKIP_WORKTREE), and tracked-but-promised-to-not-be-present-but-is-present-anyway (present-despite-SKIP_WORKTREE). Two types is a monumental amount of effort to support, and adding a third feels a bit like insanity[4]. [4] Some examples of which can be seen at https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ === Maintenance and code correctness challenges === Matheus' patches to grep stalled for nearly a year, in part because of complications of how to handle sparse-checkouts appropriately in all cases[5][6] (with trying to sanely figure out how to sanely handle present-despite-SKIP_WORKTREE files being one of the complications). His rm/add follow-ups also took months because of those kinds of issues[7]. The corner cases with things like submodules and SKIP_WORKTREE with the addition of present-despite-SKIP_WORKTREE start becoming really complex[8]. We've had to add ugly logic to merge-ort to attempt to handle present-despite-SKIP_WORKTREE files[9], and basically just been forced to give up in merge-recursive knowing full well that we'll sometimes silently discard user modifications. Despite stash essentially being a merge, it needed extra code (beyond what was in merge-ort and merge-recursive) to manually tweak SKIP_WORKTREE bits in order to avoid a few different bugs that'd result in an early abort with a partial stash application[10]. [5] See https://lore.kernel.org/git/5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br/#t and the dates on the thread; also Matheus and I had several conversations off-list trying to resolve the issues over that time [6] ...it finally kind of got unstuck after https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ [7] See for example https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/#t and quotes like "The core functionality of sparse-checkout has always been only partially implemented", a statement I still believe is true today. [8] https://lore.kernel.org/git/pull.809.git.git.1592356884310.gitgitgadget@gmail.com/ [9] See commit 66b209b86a ("merge-ort: implement CE_SKIP_WORKTREE handling with conflicted entries", 2021-03-20) [10] See commit ba359fd507 ("stash: fix stash application in sparse-checkouts", 2020-12-01) === SKIP_WORKTREE expectations in Git === A couple quotes: * From [11] (before the "sparse-checkout" command existed): If it needs too many special cases, hacks, and conditionals, then it is not worth the complexity---if it is easier to write a correct code by allowing Git to populate working tree files, it is perfectly fine to do so. In a sense, the sparse checkout "feature" itself is a hack by itself, and that is why I think this part should be "best effort" as well. * From the git-sparse-checkout manual (still present today): THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE. [11] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/ === Suggested solution === SKIP_WORKTREE was written to allow sparse-checkouts, in particular, as the name of the option implies, to allow the file to NOT be in the worktree but consider it to be unchanged rather than deleted. The suggests a simple solution: present-despite-SKIP_WORKTREE files should not exist, for those using sparse-checkouts. Enforce this at index loading time by checking if core.sparseCheckout is true; if so, check files in the index with the SKIP_WORKTREE bit set to verify that they are absent from the working tree. If they are present, unset the bit (in memory, though any commands that write to the index will record the update). Users can, of course, can get the SKIP_WORKTREE bit back such as by running `git sparse-checkout reapply` (if they have ensured the file is unmodified and doesn't match the specified sparsity patterns). === Pros/Cons of suggested solution === Pros: * Solves the user visible problems reported above, which I've been complaining about for nearly a year but couldn't find a solution to. * Helps prevent slow performance degradation with a sparse-index. * Much easier behavior in sparse-checkouts for users to reason about * Very simple, ~30 lines of code. * Significantly simplifies some ugly testcases, and obviates the need to test an entire class of potential issues. * Reduces code complexity, reasoning, and maintenance. Avoids disagreements about weird corner cases[12]. * It has been reported that some users might be (ab)using SKIP_WORKTREE as a let-me-modify-but-keep-the-file-in-the-worktree mechanism[13, and a few other similar references]. These users know of multiple caveats and shortcomings in doing so; perhaps not surprising given the "SKIP_WORKTREE expecations" section above. However, these users use `git update-index --skip-worktree`, and not `git sparse-checkout` or core.sparseCheckout=true. As such, these users would be unaffected by this change and can continue abusing the system as before. [12] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/ [13] https://stackoverflow.com/questions/13630849/git-difference-between-assume-unchanged-and-skip-worktree Cons: * When core.sparseCheckout is enabled, this adds a performance cost to reading the index. I'll defer discussion of this cost to a subsequent patch, since I have some optimizations to add. === Notes on testcase modifications === The good: * t1011: Compare to two cases above it ('read-tree will not throw away dirty changes, non-sparse'); since the file is present, it should match the non-sparse case now * t1092: sparse-index & sparse-checkout now match full-worktree behavior in more cases! Yaay for consistency! * t6428, t7012: look at how much simpler the tests become! Merge and stash can just fail early telling the user there's a file in the way, instead of not noticing until it's about to write a file and then have to implement sudden crash avoidance. Hurray for sanity! * t7817: sparse behavior better matches full tree behavior. Hurray for sanity! The confusing: * t3705: These changes were ONLY needed on Windows, but they don't hurt other platforms. Let's discuss each individually: * core.sparseCheckout should be false by default. Nothing in this testcase toggles that until many, many tests later. However, early tests (#5 in particular) were testing `update-index --skip-worktree` behavior in a non-sparse-checkout, but the Windows tests in CI were behaving as if core.sparseCheckout=true had been specified somewhere. I do not have access to a Windows machine. But I just manually did what should have been a no-op and turned the config off. And it fixed the test. * I have no idea why the leftover .gitattributes file from this test was causing failures for test #18 on Windows, but only with these changes of mine. Test #18 was checking for empty stderr, and specifically wanted to know that some error completely unrelated to file endings did not appear. The leftover .gitattributes file thus caused some spurious stderr unrelated to the thing being checked. Since other tests did not intend to test normalization, just proactively remove the .gitattributes file. I'm certain this is cleaner and better, I'm just unsure why/how this didn't trigger problems before. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-14 15:59:41 +00:00
test_all_match git ls-files --modified &&
test_all_match git ls-files &&
git -C sparse-index ls-files --sparse >actual &&
cat >expect <<-\EOF &&
a
before/
deep/
e
folder1-
folder1.x
folder1/0/0/0
folder1/0/1
folder1/a
folder10
folder2/
g
x/
z
EOF
test_cmp expect actual &&
# Double-check index expansion is avoided
ensure_not_expanded ls-files --sparse
'
sparse-checkout: integrate with sparse index When modifying the sparse-checkout definition, the sparse-checkout builtin calls update_sparsity() to modify the SKIP_WORKTREE bits of all cache entries in the index. Before, we needed the index to be fully expanded in order to ensure we had the full list of files necessary that match the new patterns. Insert a call to reset_sparse_directories() that expands sparse directories that are within the new pattern list, but only far enough that every necessary file path now exists as a cache entry. The remaining logic within update_sparsity() will modify the SKIP_WORKTREE bits appropriately. This allows us to disable command_requires_full_index within the sparse-checkout builtin. Add tests that demonstrate that we are not expanding to a full index unnecessarily. We can see the improved performance in the p2000 test script: Test HEAD~1 HEAD ------------------------------------------------------------------------ 2000.24: git ... (sparse-v3) 2.14(1.55+0.58) 1.57(1.03+0.53) -26.6% 2000.25: git ... (sparse-v4) 2.20(1.62+0.57) 1.58(0.98+0.59) -28.2% These reductions of 26-28% are small compared to most examples, but the time is dominated by writing a new copy of the base repository to the worktree and then deleting it again. The fact that the previous index expansion was such a large portion of the time is telling how important it is to complete this sparse index integration. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-23 13:48:46 +00:00
test_expect_success 'sparse index is not expanded: sparse-checkout' '
init_repos &&
ensure_not_expanded sparse-checkout set deep/deeper2 &&
ensure_not_expanded sparse-checkout set deep/deeper1 &&
ensure_not_expanded sparse-checkout set deep &&
ensure_not_expanded sparse-checkout add folder1 &&
ensure_not_expanded sparse-checkout set deep/deeper1 &&
ensure_not_expanded sparse-checkout set folder2 &&
# Demonstrate that the checks that "folder1/a" is a file
# do not cause a sparse-index expansion (since it is in the
# sparse-checkout cone).
echo >>sparse-index/folder2/a &&
git -C sparse-index add folder2/a &&
ensure_not_expanded sparse-checkout add folder1 &&
# Skip checks here, since deep/deeper1 is inside a sparse directory
# that must be expanded to check whether `deep/deeper1` is a file
# or not.
ensure_not_expanded sparse-checkout set --skip-checks deep/deeper1 &&
ensure_not_expanded sparse-checkout set
'
# NEEDSWORK: a sparse-checkout behaves differently from a full checkout
# in this scenario, but it shouldn't.
test_expect_success 'reset mixed and checkout orphan' '
init_repos &&
test_all_match git checkout rename-out-to-in &&
# Sparse checkouts do not agree with full checkouts about
# how to report a directory/file conflict during a reset.
# This command would fail with test_all_match because the
# full checkout reports "T folder1/0/1" while a sparse
# checkout reports "D folder1/0/1". This matches because
# the sparse checkouts skip "adding" the other side of
# the conflict.
test_sparse_match git reset --mixed HEAD~1 &&
test_sparse_match git ls-files --stage &&
test_sparse_match git status --porcelain=v2 &&
# At this point, sparse-checkouts behave differently
# from the full-checkout.
test_sparse_match git checkout --orphan new-branch &&
test_sparse_match git ls-files --stage &&
test_sparse_match git status --porcelain=v2
'
test_expect_success 'add everything with deep new file' '
init_repos &&
run_on_sparse git sparse-checkout set deep/deeper1/deepest &&
run_on_all touch deep/deeper1/x &&
test_all_match git add . &&
test_all_match git status --porcelain=v2
'
t1092: document bad 'git checkout' behavior Add new branches to the test repo that demonstrate directory/file conflicts in different ways. Since the directory 'folder1/' has adjacent files 'folder1-', 'folder1.txt', and 'folder10' it causes searches for 'folder1/' to land in a different place in the index than a search for 'folder1'. This causes a change in behavior when working with the df-conflict-1 and df-conflict-2 branches, whose only difference is that the first uses 'folder1' as the conflict and the other uses 'folder2' which does not have these adjacent files. We can extend two tests that compare the behavior across different 'git checkout' commands, and we see already that the behavior will be different in some cases and not in others. The difference between the two test loops is that one uses 'git reset --hard' between iterations. Further, we isolate the behavior of creating a staged change within a directory and then checking out a branch where that directory is replaced with a file. A full checkout behaves differently across these two cases, while a sparse-checkout cone behaves consistently. In both cases, the behavior is wrong. In one case, the staged change is dropped entirely. The other case the staged change is kept, replacing the file at that location, but none of the other files in the directory are kept. Likely, the correct behavior in this case is to reject the checkout and report the conflict, leaving HEAD in its previous location. None of the cases behave this way currently. Use comments to demonstrate that the tested behavior is only a documentation of the current, incorrect behavior to ensure we do not _accidentally_ change it. Instead, we would prefer to change it on purpose with a future change. At this point, the sparse-index does not handle these 'git checkout' commands correctly. Or rather, it _does_ reject the 'git checkout' when we have the staged change, but for the wrong reason. It also rejects the 'git checkout' commands when there is no staged change and we want to replace a directory with a file. A fix for that unstaged case will follow in the next change, but that will make the sparse-index agree with the full checkout case in these documented incorrect behaviors. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 20:14:40 +00:00
# NEEDSWORK: 'git checkout' behaves incorrectly in the case of
# directory/file conflicts, even without sparse-checkout. Use this
# test only as a documentation of the incorrect behavior, not a
# measure of how it _should_ behave.
test_expect_success 'checkout behaves oddly with df-conflict-1' '
init_repos &&
test_sparse_match git sparse-checkout disable &&
write_script edit-content <<-\EOF &&
echo content >>folder1/larger-content
git add folder1
EOF
run_on_all ../edit-content &&
test_all_match git status --porcelain=v2 &&
git -C sparse-checkout sparse-checkout init --cone &&
git -C sparse-index sparse-checkout init --cone --sparse-index &&
test_all_match git status --porcelain=v2 &&
# This checkout command should fail, because we have a staged
# change to folder1/larger-content, but the destination changes
# folder1 to a file.
git -C full-checkout checkout df-conflict-1 \
1>full-checkout-out \
2>full-checkout-err &&
git -C sparse-checkout checkout df-conflict-1 \
1>sparse-checkout-out \
2>sparse-checkout-err &&
unpack-trees: resolve sparse-directory/file conflicts When running unpack_trees() with a sparse index, we attempt to operate on the index without expanding the sparse directory entries. Thus, we operate by manipulating entire directories and passing them to the unpack function. In the case of the 'git checkout' command, this is the twoway_merge() function. There are several cases in twoway_merge() that handle different situations. One new one to add is the case of a directory/file conflict where the directory is sparse. Before the sparse index, such a conflict would appear as a list of file additions and deletions. Now, twoway_merge() initializes 'current', 'oldtree', and 'newtree' from src[0], src[1], and src[2], then sets 'oldtree' to NULL because it is equal to the df_conflict_entry. The way to determine that we have a directory/file conflict is to test that 'current' and 'newtree' disagree on being sparse directory entries. When we are in this case, we want to resolve the situation by calling merged_entry(). This allows replacing the 'current' entry with the 'newtree' entry. This is important for cases where we want to run 'git checkout' across the conflict and have the new HEAD represent the new file type at that path. The first NEEDSWORK comment dropped in t1092 demonstrates this necessary behavior. However, we still are in a confusing state when 'current' corresponds to a staged change within a sparse directory that is not present at HEAD. This should be atypical, because it requires adding a change outside of the sparse-checkout cone, but it is possible. Since we are unable to determine that this is a staged change within twoway_merge(), we cannot add a case to reject the merge at this point. I believe this is due to the use of df_conflict_entry in the place of 'oldtree' instead of using the valud at HEAD, which would provide some perspective to this decision. Any change that would allow this differentiation for staged entries would need to involve information further up in unpack_trees(). That work should be done, sometime, because we are further confusing the behavior of a directory/file conflict when staging a change in the directory. The two cases 'checkout behaves oddly with df-conflict-?' in t1092 demonstrate that even without a sparse-checkout, Git is not consistent in its behavior. Neither of the two options seems correct, either. This change makes the sparse-index behave differently than the typcial sparse-checkout case, but it does match the full checkout behavior in the df-conflict-2 case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 20:14:41 +00:00
git -C sparse-index checkout df-conflict-1 \
t1092: document bad 'git checkout' behavior Add new branches to the test repo that demonstrate directory/file conflicts in different ways. Since the directory 'folder1/' has adjacent files 'folder1-', 'folder1.txt', and 'folder10' it causes searches for 'folder1/' to land in a different place in the index than a search for 'folder1'. This causes a change in behavior when working with the df-conflict-1 and df-conflict-2 branches, whose only difference is that the first uses 'folder1' as the conflict and the other uses 'folder2' which does not have these adjacent files. We can extend two tests that compare the behavior across different 'git checkout' commands, and we see already that the behavior will be different in some cases and not in others. The difference between the two test loops is that one uses 'git reset --hard' between iterations. Further, we isolate the behavior of creating a staged change within a directory and then checking out a branch where that directory is replaced with a file. A full checkout behaves differently across these two cases, while a sparse-checkout cone behaves consistently. In both cases, the behavior is wrong. In one case, the staged change is dropped entirely. The other case the staged change is kept, replacing the file at that location, but none of the other files in the directory are kept. Likely, the correct behavior in this case is to reject the checkout and report the conflict, leaving HEAD in its previous location. None of the cases behave this way currently. Use comments to demonstrate that the tested behavior is only a documentation of the current, incorrect behavior to ensure we do not _accidentally_ change it. Instead, we would prefer to change it on purpose with a future change. At this point, the sparse-index does not handle these 'git checkout' commands correctly. Or rather, it _does_ reject the 'git checkout' when we have the staged change, but for the wrong reason. It also rejects the 'git checkout' commands when there is no staged change and we want to replace a directory with a file. A fix for that unstaged case will follow in the next change, but that will make the sparse-index agree with the full checkout case in these documented incorrect behaviors. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 20:14:40 +00:00
1>sparse-index-out \
2>sparse-index-err &&
# Instead, the checkout deletes the folder1 file and adds the
# folder1/larger-content file, leaving all other paths that were
# in folder1/ as deleted (without any warning).
cat >expect <<-EOF &&
D folder1
A folder1/larger-content
EOF
test_cmp expect full-checkout-out &&
test_cmp expect sparse-checkout-out &&
unpack-trees: resolve sparse-directory/file conflicts When running unpack_trees() with a sparse index, we attempt to operate on the index without expanding the sparse directory entries. Thus, we operate by manipulating entire directories and passing them to the unpack function. In the case of the 'git checkout' command, this is the twoway_merge() function. There are several cases in twoway_merge() that handle different situations. One new one to add is the case of a directory/file conflict where the directory is sparse. Before the sparse index, such a conflict would appear as a list of file additions and deletions. Now, twoway_merge() initializes 'current', 'oldtree', and 'newtree' from src[0], src[1], and src[2], then sets 'oldtree' to NULL because it is equal to the df_conflict_entry. The way to determine that we have a directory/file conflict is to test that 'current' and 'newtree' disagree on being sparse directory entries. When we are in this case, we want to resolve the situation by calling merged_entry(). This allows replacing the 'current' entry with the 'newtree' entry. This is important for cases where we want to run 'git checkout' across the conflict and have the new HEAD represent the new file type at that path. The first NEEDSWORK comment dropped in t1092 demonstrates this necessary behavior. However, we still are in a confusing state when 'current' corresponds to a staged change within a sparse directory that is not present at HEAD. This should be atypical, because it requires adding a change outside of the sparse-checkout cone, but it is possible. Since we are unable to determine that this is a staged change within twoway_merge(), we cannot add a case to reject the merge at this point. I believe this is due to the use of df_conflict_entry in the place of 'oldtree' instead of using the valud at HEAD, which would provide some perspective to this decision. Any change that would allow this differentiation for staged entries would need to involve information further up in unpack_trees(). That work should be done, sometime, because we are further confusing the behavior of a directory/file conflict when staging a change in the directory. The two cases 'checkout behaves oddly with df-conflict-?' in t1092 demonstrate that even without a sparse-checkout, Git is not consistent in its behavior. Neither of the two options seems correct, either. This change makes the sparse-index behave differently than the typcial sparse-checkout case, but it does match the full checkout behavior in the df-conflict-2 case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 20:14:41 +00:00
# The sparse-index reports no output
test_must_be_empty sparse-index-out &&
t1092: document bad 'git checkout' behavior Add new branches to the test repo that demonstrate directory/file conflicts in different ways. Since the directory 'folder1/' has adjacent files 'folder1-', 'folder1.txt', and 'folder10' it causes searches for 'folder1/' to land in a different place in the index than a search for 'folder1'. This causes a change in behavior when working with the df-conflict-1 and df-conflict-2 branches, whose only difference is that the first uses 'folder1' as the conflict and the other uses 'folder2' which does not have these adjacent files. We can extend two tests that compare the behavior across different 'git checkout' commands, and we see already that the behavior will be different in some cases and not in others. The difference between the two test loops is that one uses 'git reset --hard' between iterations. Further, we isolate the behavior of creating a staged change within a directory and then checking out a branch where that directory is replaced with a file. A full checkout behaves differently across these two cases, while a sparse-checkout cone behaves consistently. In both cases, the behavior is wrong. In one case, the staged change is dropped entirely. The other case the staged change is kept, replacing the file at that location, but none of the other files in the directory are kept. Likely, the correct behavior in this case is to reject the checkout and report the conflict, leaving HEAD in its previous location. None of the cases behave this way currently. Use comments to demonstrate that the tested behavior is only a documentation of the current, incorrect behavior to ensure we do not _accidentally_ change it. Instead, we would prefer to change it on purpose with a future change. At this point, the sparse-index does not handle these 'git checkout' commands correctly. Or rather, it _does_ reject the 'git checkout' when we have the staged change, but for the wrong reason. It also rejects the 'git checkout' commands when there is no staged change and we want to replace a directory with a file. A fix for that unstaged case will follow in the next change, but that will make the sparse-index agree with the full checkout case in these documented incorrect behaviors. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 20:14:40 +00:00
# stderr: Switched to branch df-conflict-1
unpack-trees: resolve sparse-directory/file conflicts When running unpack_trees() with a sparse index, we attempt to operate on the index without expanding the sparse directory entries. Thus, we operate by manipulating entire directories and passing them to the unpack function. In the case of the 'git checkout' command, this is the twoway_merge() function. There are several cases in twoway_merge() that handle different situations. One new one to add is the case of a directory/file conflict where the directory is sparse. Before the sparse index, such a conflict would appear as a list of file additions and deletions. Now, twoway_merge() initializes 'current', 'oldtree', and 'newtree' from src[0], src[1], and src[2], then sets 'oldtree' to NULL because it is equal to the df_conflict_entry. The way to determine that we have a directory/file conflict is to test that 'current' and 'newtree' disagree on being sparse directory entries. When we are in this case, we want to resolve the situation by calling merged_entry(). This allows replacing the 'current' entry with the 'newtree' entry. This is important for cases where we want to run 'git checkout' across the conflict and have the new HEAD represent the new file type at that path. The first NEEDSWORK comment dropped in t1092 demonstrates this necessary behavior. However, we still are in a confusing state when 'current' corresponds to a staged change within a sparse directory that is not present at HEAD. This should be atypical, because it requires adding a change outside of the sparse-checkout cone, but it is possible. Since we are unable to determine that this is a staged change within twoway_merge(), we cannot add a case to reject the merge at this point. I believe this is due to the use of df_conflict_entry in the place of 'oldtree' instead of using the valud at HEAD, which would provide some perspective to this decision. Any change that would allow this differentiation for staged entries would need to involve information further up in unpack_trees(). That work should be done, sometime, because we are further confusing the behavior of a directory/file conflict when staging a change in the directory. The two cases 'checkout behaves oddly with df-conflict-?' in t1092 demonstrate that even without a sparse-checkout, Git is not consistent in its behavior. Neither of the two options seems correct, either. This change makes the sparse-index behave differently than the typcial sparse-checkout case, but it does match the full checkout behavior in the df-conflict-2 case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 20:14:41 +00:00
test_cmp full-checkout-err sparse-checkout-err &&
t1092: document bad 'git checkout' behavior Add new branches to the test repo that demonstrate directory/file conflicts in different ways. Since the directory 'folder1/' has adjacent files 'folder1-', 'folder1.txt', and 'folder10' it causes searches for 'folder1/' to land in a different place in the index than a search for 'folder1'. This causes a change in behavior when working with the df-conflict-1 and df-conflict-2 branches, whose only difference is that the first uses 'folder1' as the conflict and the other uses 'folder2' which does not have these adjacent files. We can extend two tests that compare the behavior across different 'git checkout' commands, and we see already that the behavior will be different in some cases and not in others. The difference between the two test loops is that one uses 'git reset --hard' between iterations. Further, we isolate the behavior of creating a staged change within a directory and then checking out a branch where that directory is replaced with a file. A full checkout behaves differently across these two cases, while a sparse-checkout cone behaves consistently. In both cases, the behavior is wrong. In one case, the staged change is dropped entirely. The other case the staged change is kept, replacing the file at that location, but none of the other files in the directory are kept. Likely, the correct behavior in this case is to reject the checkout and report the conflict, leaving HEAD in its previous location. None of the cases behave this way currently. Use comments to demonstrate that the tested behavior is only a documentation of the current, incorrect behavior to ensure we do not _accidentally_ change it. Instead, we would prefer to change it on purpose with a future change. At this point, the sparse-index does not handle these 'git checkout' commands correctly. Or rather, it _does_ reject the 'git checkout' when we have the staged change, but for the wrong reason. It also rejects the 'git checkout' commands when there is no staged change and we want to replace a directory with a file. A fix for that unstaged case will follow in the next change, but that will make the sparse-index agree with the full checkout case in these documented incorrect behaviors. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 20:14:40 +00:00
test_cmp full-checkout-err sparse-checkout-err
'
# NEEDSWORK: 'git checkout' behaves incorrectly in the case of
# directory/file conflicts, even without sparse-checkout. Use this
# test only as a documentation of the incorrect behavior, not a
# measure of how it _should_ behave.
test_expect_success 'checkout behaves oddly with df-conflict-2' '
init_repos &&
test_sparse_match git sparse-checkout disable &&
write_script edit-content <<-\EOF &&
echo content >>folder2/larger-content
git add folder2
EOF
run_on_all ../edit-content &&
test_all_match git status --porcelain=v2 &&
git -C sparse-checkout sparse-checkout init --cone &&
git -C sparse-index sparse-checkout init --cone --sparse-index &&
test_all_match git status --porcelain=v2 &&
# This checkout command should fail, because we have a staged
# change to folder1/larger-content, but the destination changes
# folder1 to a file.
git -C full-checkout checkout df-conflict-2 \
1>full-checkout-out \
2>full-checkout-err &&
git -C sparse-checkout checkout df-conflict-2 \
1>sparse-checkout-out \
2>sparse-checkout-err &&
unpack-trees: resolve sparse-directory/file conflicts When running unpack_trees() with a sparse index, we attempt to operate on the index without expanding the sparse directory entries. Thus, we operate by manipulating entire directories and passing them to the unpack function. In the case of the 'git checkout' command, this is the twoway_merge() function. There are several cases in twoway_merge() that handle different situations. One new one to add is the case of a directory/file conflict where the directory is sparse. Before the sparse index, such a conflict would appear as a list of file additions and deletions. Now, twoway_merge() initializes 'current', 'oldtree', and 'newtree' from src[0], src[1], and src[2], then sets 'oldtree' to NULL because it is equal to the df_conflict_entry. The way to determine that we have a directory/file conflict is to test that 'current' and 'newtree' disagree on being sparse directory entries. When we are in this case, we want to resolve the situation by calling merged_entry(). This allows replacing the 'current' entry with the 'newtree' entry. This is important for cases where we want to run 'git checkout' across the conflict and have the new HEAD represent the new file type at that path. The first NEEDSWORK comment dropped in t1092 demonstrates this necessary behavior. However, we still are in a confusing state when 'current' corresponds to a staged change within a sparse directory that is not present at HEAD. This should be atypical, because it requires adding a change outside of the sparse-checkout cone, but it is possible. Since we are unable to determine that this is a staged change within twoway_merge(), we cannot add a case to reject the merge at this point. I believe this is due to the use of df_conflict_entry in the place of 'oldtree' instead of using the valud at HEAD, which would provide some perspective to this decision. Any change that would allow this differentiation for staged entries would need to involve information further up in unpack_trees(). That work should be done, sometime, because we are further confusing the behavior of a directory/file conflict when staging a change in the directory. The two cases 'checkout behaves oddly with df-conflict-?' in t1092 demonstrate that even without a sparse-checkout, Git is not consistent in its behavior. Neither of the two options seems correct, either. This change makes the sparse-index behave differently than the typcial sparse-checkout case, but it does match the full checkout behavior in the df-conflict-2 case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 20:14:41 +00:00
git -C sparse-index checkout df-conflict-2 \
t1092: document bad 'git checkout' behavior Add new branches to the test repo that demonstrate directory/file conflicts in different ways. Since the directory 'folder1/' has adjacent files 'folder1-', 'folder1.txt', and 'folder10' it causes searches for 'folder1/' to land in a different place in the index than a search for 'folder1'. This causes a change in behavior when working with the df-conflict-1 and df-conflict-2 branches, whose only difference is that the first uses 'folder1' as the conflict and the other uses 'folder2' which does not have these adjacent files. We can extend two tests that compare the behavior across different 'git checkout' commands, and we see already that the behavior will be different in some cases and not in others. The difference between the two test loops is that one uses 'git reset --hard' between iterations. Further, we isolate the behavior of creating a staged change within a directory and then checking out a branch where that directory is replaced with a file. A full checkout behaves differently across these two cases, while a sparse-checkout cone behaves consistently. In both cases, the behavior is wrong. In one case, the staged change is dropped entirely. The other case the staged change is kept, replacing the file at that location, but none of the other files in the directory are kept. Likely, the correct behavior in this case is to reject the checkout and report the conflict, leaving HEAD in its previous location. None of the cases behave this way currently. Use comments to demonstrate that the tested behavior is only a documentation of the current, incorrect behavior to ensure we do not _accidentally_ change it. Instead, we would prefer to change it on purpose with a future change. At this point, the sparse-index does not handle these 'git checkout' commands correctly. Or rather, it _does_ reject the 'git checkout' when we have the staged change, but for the wrong reason. It also rejects the 'git checkout' commands when there is no staged change and we want to replace a directory with a file. A fix for that unstaged case will follow in the next change, but that will make the sparse-index agree with the full checkout case in these documented incorrect behaviors. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 20:14:40 +00:00
1>sparse-index-out \
2>sparse-index-err &&
# The full checkout deviates from the df-conflict-1 case here!
# It drops the change to folder1/larger-content and leaves the
unpack-trees: resolve sparse-directory/file conflicts When running unpack_trees() with a sparse index, we attempt to operate on the index without expanding the sparse directory entries. Thus, we operate by manipulating entire directories and passing them to the unpack function. In the case of the 'git checkout' command, this is the twoway_merge() function. There are several cases in twoway_merge() that handle different situations. One new one to add is the case of a directory/file conflict where the directory is sparse. Before the sparse index, such a conflict would appear as a list of file additions and deletions. Now, twoway_merge() initializes 'current', 'oldtree', and 'newtree' from src[0], src[1], and src[2], then sets 'oldtree' to NULL because it is equal to the df_conflict_entry. The way to determine that we have a directory/file conflict is to test that 'current' and 'newtree' disagree on being sparse directory entries. When we are in this case, we want to resolve the situation by calling merged_entry(). This allows replacing the 'current' entry with the 'newtree' entry. This is important for cases where we want to run 'git checkout' across the conflict and have the new HEAD represent the new file type at that path. The first NEEDSWORK comment dropped in t1092 demonstrates this necessary behavior. However, we still are in a confusing state when 'current' corresponds to a staged change within a sparse directory that is not present at HEAD. This should be atypical, because it requires adding a change outside of the sparse-checkout cone, but it is possible. Since we are unable to determine that this is a staged change within twoway_merge(), we cannot add a case to reject the merge at this point. I believe this is due to the use of df_conflict_entry in the place of 'oldtree' instead of using the valud at HEAD, which would provide some perspective to this decision. Any change that would allow this differentiation for staged entries would need to involve information further up in unpack_trees(). That work should be done, sometime, because we are further confusing the behavior of a directory/file conflict when staging a change in the directory. The two cases 'checkout behaves oddly with df-conflict-?' in t1092 demonstrate that even without a sparse-checkout, Git is not consistent in its behavior. Neither of the two options seems correct, either. This change makes the sparse-index behave differently than the typcial sparse-checkout case, but it does match the full checkout behavior in the df-conflict-2 case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 20:14:41 +00:00
# folder1 path as-is on disk. The sparse-index behaves the same.
t1092: document bad 'git checkout' behavior Add new branches to the test repo that demonstrate directory/file conflicts in different ways. Since the directory 'folder1/' has adjacent files 'folder1-', 'folder1.txt', and 'folder10' it causes searches for 'folder1/' to land in a different place in the index than a search for 'folder1'. This causes a change in behavior when working with the df-conflict-1 and df-conflict-2 branches, whose only difference is that the first uses 'folder1' as the conflict and the other uses 'folder2' which does not have these adjacent files. We can extend two tests that compare the behavior across different 'git checkout' commands, and we see already that the behavior will be different in some cases and not in others. The difference between the two test loops is that one uses 'git reset --hard' between iterations. Further, we isolate the behavior of creating a staged change within a directory and then checking out a branch where that directory is replaced with a file. A full checkout behaves differently across these two cases, while a sparse-checkout cone behaves consistently. In both cases, the behavior is wrong. In one case, the staged change is dropped entirely. The other case the staged change is kept, replacing the file at that location, but none of the other files in the directory are kept. Likely, the correct behavior in this case is to reject the checkout and report the conflict, leaving HEAD in its previous location. None of the cases behave this way currently. Use comments to demonstrate that the tested behavior is only a documentation of the current, incorrect behavior to ensure we do not _accidentally_ change it. Instead, we would prefer to change it on purpose with a future change. At this point, the sparse-index does not handle these 'git checkout' commands correctly. Or rather, it _does_ reject the 'git checkout' when we have the staged change, but for the wrong reason. It also rejects the 'git checkout' commands when there is no staged change and we want to replace a directory with a file. A fix for that unstaged case will follow in the next change, but that will make the sparse-index agree with the full checkout case in these documented incorrect behaviors. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 20:14:40 +00:00
test_must_be_empty full-checkout-out &&
unpack-trees: resolve sparse-directory/file conflicts When running unpack_trees() with a sparse index, we attempt to operate on the index without expanding the sparse directory entries. Thus, we operate by manipulating entire directories and passing them to the unpack function. In the case of the 'git checkout' command, this is the twoway_merge() function. There are several cases in twoway_merge() that handle different situations. One new one to add is the case of a directory/file conflict where the directory is sparse. Before the sparse index, such a conflict would appear as a list of file additions and deletions. Now, twoway_merge() initializes 'current', 'oldtree', and 'newtree' from src[0], src[1], and src[2], then sets 'oldtree' to NULL because it is equal to the df_conflict_entry. The way to determine that we have a directory/file conflict is to test that 'current' and 'newtree' disagree on being sparse directory entries. When we are in this case, we want to resolve the situation by calling merged_entry(). This allows replacing the 'current' entry with the 'newtree' entry. This is important for cases where we want to run 'git checkout' across the conflict and have the new HEAD represent the new file type at that path. The first NEEDSWORK comment dropped in t1092 demonstrates this necessary behavior. However, we still are in a confusing state when 'current' corresponds to a staged change within a sparse directory that is not present at HEAD. This should be atypical, because it requires adding a change outside of the sparse-checkout cone, but it is possible. Since we are unable to determine that this is a staged change within twoway_merge(), we cannot add a case to reject the merge at this point. I believe this is due to the use of df_conflict_entry in the place of 'oldtree' instead of using the valud at HEAD, which would provide some perspective to this decision. Any change that would allow this differentiation for staged entries would need to involve information further up in unpack_trees(). That work should be done, sometime, because we are further confusing the behavior of a directory/file conflict when staging a change in the directory. The two cases 'checkout behaves oddly with df-conflict-?' in t1092 demonstrate that even without a sparse-checkout, Git is not consistent in its behavior. Neither of the two options seems correct, either. This change makes the sparse-index behave differently than the typcial sparse-checkout case, but it does match the full checkout behavior in the df-conflict-2 case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 20:14:41 +00:00
test_must_be_empty sparse-index-out &&
t1092: document bad 'git checkout' behavior Add new branches to the test repo that demonstrate directory/file conflicts in different ways. Since the directory 'folder1/' has adjacent files 'folder1-', 'folder1.txt', and 'folder10' it causes searches for 'folder1/' to land in a different place in the index than a search for 'folder1'. This causes a change in behavior when working with the df-conflict-1 and df-conflict-2 branches, whose only difference is that the first uses 'folder1' as the conflict and the other uses 'folder2' which does not have these adjacent files. We can extend two tests that compare the behavior across different 'git checkout' commands, and we see already that the behavior will be different in some cases and not in others. The difference between the two test loops is that one uses 'git reset --hard' between iterations. Further, we isolate the behavior of creating a staged change within a directory and then checking out a branch where that directory is replaced with a file. A full checkout behaves differently across these two cases, while a sparse-checkout cone behaves consistently. In both cases, the behavior is wrong. In one case, the staged change is dropped entirely. The other case the staged change is kept, replacing the file at that location, but none of the other files in the directory are kept. Likely, the correct behavior in this case is to reject the checkout and report the conflict, leaving HEAD in its previous location. None of the cases behave this way currently. Use comments to demonstrate that the tested behavior is only a documentation of the current, incorrect behavior to ensure we do not _accidentally_ change it. Instead, we would prefer to change it on purpose with a future change. At this point, the sparse-index does not handle these 'git checkout' commands correctly. Or rather, it _does_ reject the 'git checkout' when we have the staged change, but for the wrong reason. It also rejects the 'git checkout' commands when there is no staged change and we want to replace a directory with a file. A fix for that unstaged case will follow in the next change, but that will make the sparse-index agree with the full checkout case in these documented incorrect behaviors. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 20:14:40 +00:00
# In the sparse-checkout case, the checkout deletes the folder1
# file and adds the folder1/larger-content file, leaving all other
# paths that were in folder1/ as deleted (without any warning).
cat >expect <<-EOF &&
D folder2
A folder2/larger-content
EOF
test_cmp expect sparse-checkout-out &&
# Switched to branch df-conflict-1
unpack-trees: resolve sparse-directory/file conflicts When running unpack_trees() with a sparse index, we attempt to operate on the index without expanding the sparse directory entries. Thus, we operate by manipulating entire directories and passing them to the unpack function. In the case of the 'git checkout' command, this is the twoway_merge() function. There are several cases in twoway_merge() that handle different situations. One new one to add is the case of a directory/file conflict where the directory is sparse. Before the sparse index, such a conflict would appear as a list of file additions and deletions. Now, twoway_merge() initializes 'current', 'oldtree', and 'newtree' from src[0], src[1], and src[2], then sets 'oldtree' to NULL because it is equal to the df_conflict_entry. The way to determine that we have a directory/file conflict is to test that 'current' and 'newtree' disagree on being sparse directory entries. When we are in this case, we want to resolve the situation by calling merged_entry(). This allows replacing the 'current' entry with the 'newtree' entry. This is important for cases where we want to run 'git checkout' across the conflict and have the new HEAD represent the new file type at that path. The first NEEDSWORK comment dropped in t1092 demonstrates this necessary behavior. However, we still are in a confusing state when 'current' corresponds to a staged change within a sparse directory that is not present at HEAD. This should be atypical, because it requires adding a change outside of the sparse-checkout cone, but it is possible. Since we are unable to determine that this is a staged change within twoway_merge(), we cannot add a case to reject the merge at this point. I believe this is due to the use of df_conflict_entry in the place of 'oldtree' instead of using the valud at HEAD, which would provide some perspective to this decision. Any change that would allow this differentiation for staged entries would need to involve information further up in unpack_trees(). That work should be done, sometime, because we are further confusing the behavior of a directory/file conflict when staging a change in the directory. The two cases 'checkout behaves oddly with df-conflict-?' in t1092 demonstrate that even without a sparse-checkout, Git is not consistent in its behavior. Neither of the two options seems correct, either. This change makes the sparse-index behave differently than the typcial sparse-checkout case, but it does match the full checkout behavior in the df-conflict-2 case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 20:14:41 +00:00
test_cmp full-checkout-err sparse-checkout-err &&
test_cmp full-checkout-err sparse-index-err
t1092: document bad 'git checkout' behavior Add new branches to the test repo that demonstrate directory/file conflicts in different ways. Since the directory 'folder1/' has adjacent files 'folder1-', 'folder1.txt', and 'folder10' it causes searches for 'folder1/' to land in a different place in the index than a search for 'folder1'. This causes a change in behavior when working with the df-conflict-1 and df-conflict-2 branches, whose only difference is that the first uses 'folder1' as the conflict and the other uses 'folder2' which does not have these adjacent files. We can extend two tests that compare the behavior across different 'git checkout' commands, and we see already that the behavior will be different in some cases and not in others. The difference between the two test loops is that one uses 'git reset --hard' between iterations. Further, we isolate the behavior of creating a staged change within a directory and then checking out a branch where that directory is replaced with a file. A full checkout behaves differently across these two cases, while a sparse-checkout cone behaves consistently. In both cases, the behavior is wrong. In one case, the staged change is dropped entirely. The other case the staged change is kept, replacing the file at that location, but none of the other files in the directory are kept. Likely, the correct behavior in this case is to reject the checkout and report the conflict, leaving HEAD in its previous location. None of the cases behave this way currently. Use comments to demonstrate that the tested behavior is only a documentation of the current, incorrect behavior to ensure we do not _accidentally_ change it. Instead, we would prefer to change it on purpose with a future change. At this point, the sparse-index does not handle these 'git checkout' commands correctly. Or rather, it _does_ reject the 'git checkout' when we have the staged change, but for the wrong reason. It also rejects the 'git checkout' commands when there is no staged change and we want to replace a directory with a file. A fix for that unstaged case will follow in the next change, but that will make the sparse-index agree with the full checkout case in these documented incorrect behaviors. Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 20:14:40 +00:00
'
test_expect_success 'mv directory from out-of-cone to in-cone' '
init_repos &&
# <source> as a sparse directory (or SKIP_WORKTREE_DIR without enabling
# sparse index).
test_all_match git mv --sparse folder1 deep &&
test_all_match git status --porcelain=v2 &&
test_sparse_match git ls-files -t &&
git -C sparse-checkout ls-files -t >actual &&
grep -e "H deep/folder1/0/0/0" actual &&
grep -e "H deep/folder1/0/1" actual &&
grep -e "H deep/folder1/a" actual &&
test_all_match git reset --hard &&
# <source> as a directory deeper than sparse index boundary (where
# sparse index will expand).
test_sparse_match git mv --sparse folder1/0 deep &&
test_sparse_match git status --porcelain=v2 &&
test_sparse_match git ls-files -t &&
git -C sparse-checkout ls-files -t >actual &&
grep -e "H deep/0/0/0" actual &&
grep -e "H deep/0/1" actual
'
test_expect_success 'rm pathspec inside sparse definition' '
init_repos &&
test_all_match git rm deep/a &&
test_all_match git status --porcelain=v2 &&
# test wildcard
run_on_all git reset --hard &&
test_all_match git rm deep/* &&
test_all_match git status --porcelain=v2 &&
# test recursive rm
run_on_all git reset --hard &&
test_all_match git rm -r deep &&
test_all_match git status --porcelain=v2
'
rm: integrate with sparse-index Enable the sparse index within the `git-rm` command. The `p2000` tests demonstrate a ~92% execution time reduction for 'git rm' using a sparse index. Test HEAD~1 HEAD -------------------------------------------------------------------------- 2000.74: git rm ... (full-v3) 0.41(0.37+0.05) 0.43(0.36+0.07) +4.9% 2000.75: git rm ... (full-v4) 0.38(0.34+0.05) 0.39(0.35+0.05) +2.6% 2000.76: git rm ... (sparse-v3) 0.57(0.56+0.01) 0.05(0.05+0.00) -91.2% 2000.77: git rm ... (sparse-v4) 0.57(0.55+0.02) 0.03(0.03+0.00) -94.7% ---- Also, normalize a behavioral difference of `git-rm` under sparse-index. See related discussion [1]. `git-rm` a sparse-directory entry within a sparse-index enabled repo behaves differently from a sparse directory within a sparse-checkout enabled repo. For example, in a sparse-index repo, where 'folder1' is a sparse-directory entry, `git rm -r --sparse folder1` provides this: rm 'folder1/' Whereas in a sparse-checkout repo *without* sparse-index, doing so provides this: rm 'folder1/0/0/0' rm 'folder1/0/1' rm 'folder1/a' Because `git rm` a sparse-directory entry does not need to expand the index, therefore we should accept the current behavior, which is faster than "expand the sparse-directory entry to match the sparse-checkout situation". Modify a previous test so such difference is not considered as an error. [1] https://github.com/ffyuanda/git/pull/6#discussion_r934861398 Helped-by: Victoria Dye <vdye@github.com> Helped-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-07 04:13:35 +00:00
test_expect_success 'rm pathspec outside sparse definition' '
init_repos &&
for file in folder1/a folder1/0/1
do
test_sparse_match test_must_fail git rm $file &&
test_sparse_match test_must_fail git rm --cached $file &&
test_sparse_match git rm --sparse $file &&
test_sparse_match git status --porcelain=v2 || return 1
done &&
cat >folder1-full <<-EOF &&
rm ${SQ}folder1/0/0/0${SQ}
rm ${SQ}folder1/0/1${SQ}
rm ${SQ}folder1/a${SQ}
EOF
cat >folder1-sparse <<-EOF &&
rm ${SQ}folder1/${SQ}
EOF
# test wildcard
run_on_sparse git reset --hard &&
run_on_sparse git sparse-checkout reapply &&
test_sparse_match test_must_fail git rm folder1/* &&
run_on_sparse git rm --sparse folder1/* &&
test_cmp folder1-full sparse-checkout-out &&
test_cmp folder1-sparse sparse-index-out &&
test_sparse_match git status --porcelain=v2 &&
# test recursive rm
run_on_sparse git reset --hard &&
run_on_sparse git sparse-checkout reapply &&
test_sparse_match test_must_fail git rm --sparse folder1 &&
run_on_sparse git rm --sparse -r folder1 &&
test_cmp folder1-full sparse-checkout-out &&
test_cmp folder1-sparse sparse-index-out &&
test_sparse_match git status --porcelain=v2
'
rm: integrate with sparse-index Enable the sparse index within the `git-rm` command. The `p2000` tests demonstrate a ~92% execution time reduction for 'git rm' using a sparse index. Test HEAD~1 HEAD -------------------------------------------------------------------------- 2000.74: git rm ... (full-v3) 0.41(0.37+0.05) 0.43(0.36+0.07) +4.9% 2000.75: git rm ... (full-v4) 0.38(0.34+0.05) 0.39(0.35+0.05) +2.6% 2000.76: git rm ... (sparse-v3) 0.57(0.56+0.01) 0.05(0.05+0.00) -91.2% 2000.77: git rm ... (sparse-v4) 0.57(0.55+0.02) 0.03(0.03+0.00) -94.7% ---- Also, normalize a behavioral difference of `git-rm` under sparse-index. See related discussion [1]. `git-rm` a sparse-directory entry within a sparse-index enabled repo behaves differently from a sparse directory within a sparse-checkout enabled repo. For example, in a sparse-index repo, where 'folder1' is a sparse-directory entry, `git rm -r --sparse folder1` provides this: rm 'folder1/' Whereas in a sparse-checkout repo *without* sparse-index, doing so provides this: rm 'folder1/0/0/0' rm 'folder1/0/1' rm 'folder1/a' Because `git rm` a sparse-directory entry does not need to expand the index, therefore we should accept the current behavior, which is faster than "expand the sparse-directory entry to match the sparse-checkout situation". Modify a previous test so such difference is not considered as an error. [1] https://github.com/ffyuanda/git/pull/6#discussion_r934861398 Helped-by: Victoria Dye <vdye@github.com> Helped-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-07 04:13:35 +00:00
test_expect_success 'rm pathspec expands index when necessary' '
init_repos &&
# in-cone pathspec (do not expand)
ensure_not_expanded rm "deep/deep*" &&
test_must_be_empty sparse-index-err &&
# out-of-cone pathspec (expand)
! ensure_not_expanded rm --sparse "folder1/a*" &&
test_must_be_empty sparse-index-err &&
# pathspec that should expand index
! ensure_not_expanded rm "*/a" &&
test_must_be_empty sparse-index-err &&
! ensure_not_expanded rm "**a" &&
test_must_be_empty sparse-index-err
'
rm: integrate with sparse-index Enable the sparse index within the `git-rm` command. The `p2000` tests demonstrate a ~92% execution time reduction for 'git rm' using a sparse index. Test HEAD~1 HEAD -------------------------------------------------------------------------- 2000.74: git rm ... (full-v3) 0.41(0.37+0.05) 0.43(0.36+0.07) +4.9% 2000.75: git rm ... (full-v4) 0.38(0.34+0.05) 0.39(0.35+0.05) +2.6% 2000.76: git rm ... (sparse-v3) 0.57(0.56+0.01) 0.05(0.05+0.00) -91.2% 2000.77: git rm ... (sparse-v4) 0.57(0.55+0.02) 0.03(0.03+0.00) -94.7% ---- Also, normalize a behavioral difference of `git-rm` under sparse-index. See related discussion [1]. `git-rm` a sparse-directory entry within a sparse-index enabled repo behaves differently from a sparse directory within a sparse-checkout enabled repo. For example, in a sparse-index repo, where 'folder1' is a sparse-directory entry, `git rm -r --sparse folder1` provides this: rm 'folder1/' Whereas in a sparse-checkout repo *without* sparse-index, doing so provides this: rm 'folder1/0/0/0' rm 'folder1/0/1' rm 'folder1/a' Because `git rm` a sparse-directory entry does not need to expand the index, therefore we should accept the current behavior, which is faster than "expand the sparse-directory entry to match the sparse-checkout situation". Modify a previous test so such difference is not considered as an error. [1] https://github.com/ffyuanda/git/pull/6#discussion_r934861398 Helped-by: Victoria Dye <vdye@github.com> Helped-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-07 04:13:35 +00:00
test_expect_success 'sparse index is not expanded: rm' '
init_repos &&
ensure_not_expanded rm deep/a &&
# test in-cone wildcard
git -C sparse-index reset --hard &&
ensure_not_expanded rm deep/* &&
# test recursive rm
git -C sparse-index reset --hard &&
ensure_not_expanded rm -r deep
'
builtin/grep.c: integrate with sparse index Turn on sparse index and remove ensure_full_index(). Before this patch, `git-grep` utilizes the ensure_full_index() method to expand the index and search all the entries. Because this method requires walking all the trees and constructing the index, it is the slow part within the whole command. To achieve better performance, this patch uses grep_tree() to search the sparse directory entries and get rid of the ensure_full_index() method. Why grep_tree() is a better choice over ensure_full_index()? 1) grep_tree() is as correct as ensure_full_index(). grep_tree() looks into every sparse-directory entry (represented by a tree) recursively when looping over the index, and the result of doing so matches the result of expanding the index. 2) grep_tree() utilizes pathspecs to limit the scope of searching. ensure_full_index() always expands the index, which means it will always walk all the trees and blobs in the repo without caring if the user only wants a subset of the content, i.e. using a pathspec. On the other hand, grep_tree() will only search the contents that match the pathspec, and thus possibly walking fewer trees. 3) grep_tree() does not construct and copy back a new index, while ensure_full_index() does. This also saves some time. ---------------- Performance test - Summary: p2000 tests demonstrate a ~71% execution time reduction for `git grep --cached bogus -- "f2/f1/f1/*"` using tree-walking logic. However, notice that this result varies depending on the pathspec given. See below "Command used for testing" for more details. Test HEAD~ HEAD ------------------------------------------------------- 2000.78: git grep ... (full-v3) 0.35 0.39 (≈) 2000.79: git grep ... (full-v4) 0.36 0.30 (≈) 2000.80: git grep ... (sparse-v3) 0.88 0.23 (-73.8%) 2000.81: git grep ... (sparse-v4) 0.83 0.26 (-68.6%) - Command used for testing: git grep --cached bogus -- "f2/f1/f1/*" The reason for specifying a pathspec is that, if we don't specify a pathspec, then grep_tree() will walk all the trees and blobs to find the pattern, and the time consumed doing so is not too different from using the original ensure_full_index() method, which also spends most of the time walking trees. However, when a pathspec is specified, this latest logic will only walk the area of trees enclosed by the pathspec, and the time consumed is reasonably a lot less. Generally speaking, because the performance gain is acheived by walking less trees, which are specified by the pathspec, the HEAD time v.s. HEAD~ time in sparse-v[3|4], should be proportional to "pathspec enclosed area" v.s. "all area", respectively. Namely, the wider the <pathspec> is encompassing, the less the performance difference between HEAD~ and HEAD, and vice versa. That is, if we don't specify a pathspec, the performance difference [1] is indistinguishable: both methods walk all the trees and take generally same amount of time (even with the index construction time included for ensure_full_index()). [1] Performance test result without pathspec (hence walking all trees): Command used: git grep --cached bogus Test HEAD~ HEAD --------------------------------------------------- 2000.78: git grep ... (full-v3) 6.17 5.19 (≈) 2000.79: git grep ... (full-v4) 6.19 5.46 (≈) 2000.80: git grep ... (sparse-v3) 6.57 6.44 (≈) 2000.81: git grep ... (sparse-v4) 6.65 6.28 (≈) -------------------------- NEEDSWORK about submodules There are a few NEEDSWORKs that belong to improvements beyond this topic. See the NEEDSWORK in builtin/grep.c::grep_submodule() for more context. The other two NEEDSWORKs in t1092 are also relative. Suggested-by: Derrick Stolee <derrickstolee@github.com> Helped-by: Derrick Stolee <derrickstolee@github.com> Helped-by: Victoria Dye <vdye@github.com> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-23 04:18:42 +00:00
test_expect_success 'grep with and --cached' '
init_repos &&
test_all_match git grep --cached a &&
test_all_match git grep --cached a -- "folder1/*"
'
test_expect_success 'grep is not expanded' '
init_repos &&
ensure_not_expanded grep a &&
ensure_not_expanded grep a -- deep/* &&
# All files within the folder1/* pathspec are sparse,
# so this command does not find any matches
ensure_not_expanded ! grep a -- folder1/* &&
# test out-of-cone pathspec with or without wildcard
ensure_not_expanded grep --cached a -- "folder1/a" &&
ensure_not_expanded grep --cached a -- "folder1/*" &&
# test in-cone pathspec with or without wildcard
ensure_not_expanded grep --cached a -- "deep/a" &&
ensure_not_expanded grep --cached a -- "deep/*"
'
# NEEDSWORK: when running `grep` in the superproject with --recurse-submodules,
# Git expands the index of the submodules unexpectedly. Even though `grep`
# builtin is marked as "command_requires_full_index = 0", this config is only
# useful for the superproject. Namely, the submodules have their own configs,
# which are _not_ populated by the one-time sparse-index feature switch.
test_expect_failure 'grep within submodules is not expanded' '
init_repos_as_submodules &&
# do not use ensure_not_expanded() here, becasue `grep` should be
# run in the superproject, not in "./sparse-index"
GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
git grep --cached --recurse-submodules a -- "*/folder1/*" &&
test_region ! index ensure_full_index trace2.txt
'
# NEEDSWORK: this test is not actually testing the code. The design purpose
# of this test is to verify the grep result when the submodules are using a
# sparse-index. Namely, we want "folder1/" as a tree (a sparse directory); but
# because of the index expansion, we are now grepping the "folder1/a" blob.
# Because of the problem stated above 'grep within submodules is not expanded',
# we don't have the ideal test environment yet.
test_expect_success 'grep sparse directory within submodules' '
init_repos_as_submodules &&
cat >expect <<-\EOF &&
full-checkout/folder1/a:a
sparse-checkout/folder1/a:a
sparse-index/folder1/a:a
EOF
git grep --cached --recurse-submodules a -- "*/folder1/*" >actual &&
test_cmp actual expect
'
test_expect_success 'write-tree' '
init_repos &&
test_all_match git write-tree &&
write_script edit-contents <<-\EOF &&
echo text >>"$1"
EOF
# make a change inside the sparse cone
run_on_all ../edit-contents deep/a &&
test_all_match git update-index deep/a &&
test_all_match git write-tree &&
test_all_match git status --porcelain=v2 &&
# make a change outside the sparse cone
run_on_all mkdir -p folder1 &&
run_on_all cp a folder1/a &&
run_on_all ../edit-contents folder1/a &&
test_all_match git update-index folder1/a &&
test_all_match git write-tree &&
test_all_match git status --porcelain=v2 &&
# check that SKIP_WORKTREE files are not materialized
test_path_is_missing sparse-checkout/folder2/a &&
test_path_is_missing sparse-index/folder2/a
'
test_expect_success 'sparse-index is not expanded: write-tree' '
init_repos &&
ensure_not_expanded write-tree &&
echo "test1" >>sparse-index/a &&
git -C sparse-index update-index a &&
ensure_not_expanded write-tree
'
test_expect_success 'diff-files with pathspec inside sparse definition' '
init_repos &&
write_script edit-contents <<-\EOF &&
echo text >>"$1"
EOF
run_on_all ../edit-contents deep/a &&
test_all_match git diff-files &&
test_all_match git diff-files -- deep/a &&
# test wildcard
test_all_match git diff-files -- "deep/*"
'
test_expect_success 'diff-files with pathspec outside sparse definition' '
init_repos &&
test_sparse_match git diff-files -- folder2/a &&
write_script edit-contents <<-\EOF &&
echo text >>"$1"
EOF
# The directory "folder1" is outside the cone of interest
# and will not exist in the sparse checkout repositories.
# Create it as needed, add file "folder1/a" there with
# contents that is different from the staged version.
run_on_all mkdir -p folder1 &&
run_on_all cp a folder1/a &&
run_on_all ../edit-contents folder1/a &&
test_all_match git diff-files &&
test_all_match git diff-files -- folder1/a &&
test_all_match git diff-files -- "folder*/a"
'
test_expect_success 'sparse index is not expanded: diff-files' '
init_repos &&
write_script edit-contents <<-\EOF &&
echo text >>"$1"
EOF
run_on_all ../edit-contents deep/a &&
ensure_not_expanded diff-files &&
ensure_not_expanded diff-files -- deep/a &&
ensure_not_expanded diff-files -- "deep/*"
'
diff-tree: integrate with sparse index The index is read in 'cmd_diff_tree' at two points: 1. The first index read was added in fd66bcc31ff (diff-tree: read the index so attribute checks work in bare repositories, 2017-12-06) to deal with reading '.gitattributes' content. 77efbb366ab (attr: be careful about sparse directories, 2021-09-08) established that, in a sparse index, we do _not_ try to load a '.gitattributes' file from within a sparse directory. 2. The second index access point is involved in rename detection, specifically when reading from stdin.This was initially added in f0c6b2a2fd9 ([PATCH] Optimize diff-tree -[CM]--stdin, 2005-05-27), where 'setup' was set to 'DIFF_SETUP_USE_SIZE_CACHE |DIFF_SETUP_USE_CACHE'. That assignment was later modified to drop the'DIFF_SETUP_USE_CACHE' in ff7fe37b053 (diff.c: move read_index() code back to the caller, 2018-08-13).However, 'DIFF_SETUP_USE_SIZE_CACHE' seems to be unused as of 6e0b8ed6d35 (diff.c: do not use a separate "size cache"., 2007-05-07) and nothing about 'detect_rename' otherwise indicates index usage. Hence we can just set the requires-full-index to false for "diff-tree". Add tests that verify that 'git diff-tree' behaves correctly when the sparse index is enabled and test to ensure the index is not expanded. The `p2000` tests demonstrate a ~98% execution time reduction for 'git diff-tree' using a sparse index: Test before after ----------------------------------------------------------------------- 2000.94: git diff-tree HEAD (full-v3) 0.05 0.04 -20.0% 2000.95: git diff-tree HEAD (full-v4) 0.06 0.05 -16.7% 2000.96: git diff-tree HEAD (sparse-v3) 0.59 0.01 -98.3% 2000.97: git diff-tree HEAD (sparse-v4) 0.61 0.01 -98.4% 2000.98: git diff-tree HEAD -- f2/f4/a (full-v3) 0.05 0.05 +0.0% 2000.99: git diff-tree HEAD -- f2/f4/a (full-v4) 0.05 0.04 -20.0% 2000.100: git diff-tree HEAD -- f2/f4/a (sparse-v3) 0.58 0.01 -98.3% 2000.101: git diff-tree HEAD -- f2/f4/a (sparse-v4) 0.55 0.01 -98.2% Helped-by: Victoria Dye <vdye@github.com> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-18 15:44:54 +00:00
test_expect_success 'diff-tree' '
init_repos &&
# Test change inside sparse cone
tree1=$(git -C sparse-index rev-parse HEAD^{tree}) &&
tree2=$(git -C sparse-index rev-parse update-deep^{tree}) &&
test_all_match git diff-tree $tree1 $tree2 &&
test_all_match git diff-tree $tree1 $tree2 -- deep/a &&
test_all_match git diff-tree HEAD update-deep &&
test_all_match git diff-tree HEAD update-deep -- deep/a &&
# Test change outside sparse cone
tree3=$(git -C sparse-index rev-parse update-folder1^{tree}) &&
test_all_match git diff-tree $tree1 $tree3 &&
test_all_match git diff-tree $tree1 $tree3 -- folder1/a &&
test_all_match git diff-tree HEAD update-folder1 &&
test_all_match git diff-tree HEAD update-folder1 -- folder1/a &&
# Check that SKIP_WORKTREE files are not materialized
test_path_is_missing sparse-checkout/folder1/a &&
test_path_is_missing sparse-index/folder1/a &&
test_path_is_missing sparse-checkout/folder2/a &&
test_path_is_missing sparse-index/folder2/a
'
test_expect_success 'sparse-index is not expanded: diff-tree' '
init_repos &&
tree1=$(git -C sparse-index rev-parse HEAD^{tree}) &&
tree2=$(git -C sparse-index rev-parse update-deep^{tree}) &&
tree3=$(git -C sparse-index rev-parse update-folder1^{tree}) &&
ensure_not_expanded diff-tree $tree1 $tree2 &&
ensure_not_expanded diff-tree $tree1 $tree2 -- deep/a &&
ensure_not_expanded diff-tree HEAD update-deep &&
ensure_not_expanded diff-tree HEAD update-deep -- deep/a &&
ensure_not_expanded diff-tree $tree1 $tree3 &&
ensure_not_expanded diff-tree $tree1 $tree3 -- folder1/a &&
ensure_not_expanded diff-tree HEAD update-folder1 &&
ensure_not_expanded diff-tree HEAD update-folder1 -- folder1/a
'
worktree: integrate with sparse-index The index is read in 'worktree.c' at two points: 1.The 'validate_no_submodules' function, which checks if there are any submodules present in the worktree. 2.The 'check_clean_worktree' function, which verifies if a worktree is 'clean', i.e., there are no untracked or modified but uncommitted files. This is done by running the 'git status' command, and an error message is thrown if the worktree is not clean. Given that 'git status' is already sparse-aware, the function is also sparse-aware. Hence we can just set the requires-full-index to false for "git worktree". Add tests that verify that 'git worktree' behaves correctly when the sparse index is enabled and test to ensure the index is not expanded. The `p2000` tests demonstrate a ~20% execution time reduction for 'git worktree' using a sparse index: (Note:the p2000 test results didn't reflect the huge speedup because of the index reading time is minuscule comparing to the filesystem operations.) Test before after ----------------------------------------------------------------------- 2000.102: git worktree add....(full-v3) 3.15 2.82 -10.5% 2000.103: git worktree add....(full-v4) 3.14 2.84 -9.6% 2000.104: git worktree add....(sparse-v3) 2.59 2.14 -16.4% 2000.105: git worktree add....(sparse-v4) 2.10 1.57 -25.2% Helped-by: Victoria Dye <vdye@github.com> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com> Acked-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-06 17:26:33 +00:00
test_expect_success 'worktree' '
init_repos &&
write_script edit-contents <<-\EOF &&
echo text >>"$1"
EOF
for repo in full-checkout sparse-checkout sparse-index
do
worktree=${repo}-wt &&
git -C $repo worktree add ../$worktree &&
# Compare worktree content with "ls"
(cd $repo && ls) >worktree_contents &&
(cd $worktree && ls) >new_worktree_contents &&
test_cmp worktree_contents new_worktree_contents &&
# Compare index content with "ls-files --sparse"
git -C $repo ls-files --sparse >index_contents &&
git -C $worktree ls-files --sparse >new_index_contents &&
test_cmp index_contents new_index_contents &&
git -C $repo worktree remove ../$worktree || return 1
done &&
test_all_match git worktree add .worktrees/hotfix &&
run_on_all ../edit-contents .worktrees/hotfix/deep/a &&
test_all_match test_must_fail git worktree remove .worktrees/hotfix
'
test_expect_success 'worktree is not expanded' '
init_repos &&
ensure_not_expanded worktree add .worktrees/hotfix &&
ensure_not_expanded worktree remove .worktrees/hotfix
'
t1092: add tests for 'git check-attr' Add tests for `git check-attr`, make sure attribute file does get read from index when path is either inside or outside of sparse-checkout definition. Add a test named 'diff --check with pathspec outside sparse definition'. It starts by disabling the trailing whitespace and space-before-tab checks using the core. whitespace configuration option. Then, it specifically re-enables the trailing whitespace check for a file located in a sparse directory by adding a whitespace=trailing-space rule to the .gitattributes file within that directory. Next, create and populate the folder1 directory, and then add the .gitattributes file to the index. Edit the contents of folder1/a, add it to the index, and proceed to "re-sparsify" 'folder1/' with 'git sparse-checkout reapply'. Finally, use 'git diff --check --cached' to compare the 'index vs. HEAD', ensuring the correct application of the attribute rules even when the file's path is outside the sparse-checkout definition. Mark the two tests 'check-attr with pathspec outside sparse definition' and 'diff --check with pathspec outside sparse definition' as 'test_expect_failure' to reflect an existing issue where the attributes inside a sparse directory are ignored. Ensure that the 'check-attr' command fails to read the required attributes to demonstrate this expected failure. Helped-by: Victoria Dye <vdye@github.com> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-11 14:22:09 +00:00
test_expect_success 'check-attr with pathspec inside sparse definition' '
init_repos &&
echo "a -crlf myAttr" >>.gitattributes &&
run_on_all cp ../.gitattributes ./deep &&
test_all_match git check-attr -a -- deep/a &&
test_all_match git add deep/.gitattributes &&
test_all_match git check-attr -a --cached -- deep/a
'
attr.c: read attributes in a sparse directory Before this patch, git check-attr was unable to read the attributes from a .gitattributes file within a sparse directory. The original comment was operating under the assumption that users are only interested in files or directories inside the cones. Therefore, in the original code, in the case of a cone-mode sparse-checkout, we didn't load the .gitattributes file. However, this behavior can lead to missing attributes for files inside sparse directories, causing inconsistencies in file handling. To resolve this, revise 'git check-attr' to allow attribute reading for files in sparse directories from the corresponding .gitattributes files: 1.Utilize path_in_cone_mode_sparse_checkout() and index_name_pos_sparse to check if a path falls within a sparse directory. 2.If path is inside a sparse directory, employ the value of index_name_pos_sparse() to find the sparse directory containing path and path relative to sparse directory. Proceed to read attributes from the tree OID of the sparse directory using read_attr_from_blob(). 3.If path is not inside a sparse directory,ensure that attributes are fetched from the index blob with read_blob_data_from_index(). Change the test 'check-attr with pathspec outside sparse definition' to 'test_expect_success' to reflect that the attributes inside a sparse directory can now be read. Ensure that the sparse index case works correctly for git check-attr to illustrate the successful handling of attributes within sparse directories. Helped-by: Victoria Dye <vdye@github.com> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-11 14:22:10 +00:00
test_expect_success 'check-attr with pathspec outside sparse definition' '
t1092: add tests for 'git check-attr' Add tests for `git check-attr`, make sure attribute file does get read from index when path is either inside or outside of sparse-checkout definition. Add a test named 'diff --check with pathspec outside sparse definition'. It starts by disabling the trailing whitespace and space-before-tab checks using the core. whitespace configuration option. Then, it specifically re-enables the trailing whitespace check for a file located in a sparse directory by adding a whitespace=trailing-space rule to the .gitattributes file within that directory. Next, create and populate the folder1 directory, and then add the .gitattributes file to the index. Edit the contents of folder1/a, add it to the index, and proceed to "re-sparsify" 'folder1/' with 'git sparse-checkout reapply'. Finally, use 'git diff --check --cached' to compare the 'index vs. HEAD', ensuring the correct application of the attribute rules even when the file's path is outside the sparse-checkout definition. Mark the two tests 'check-attr with pathspec outside sparse definition' and 'diff --check with pathspec outside sparse definition' as 'test_expect_failure' to reflect an existing issue where the attributes inside a sparse directory are ignored. Ensure that the 'check-attr' command fails to read the required attributes to demonstrate this expected failure. Helped-by: Victoria Dye <vdye@github.com> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-11 14:22:09 +00:00
init_repos &&
echo "a -crlf myAttr" >>.gitattributes &&
run_on_sparse mkdir folder1 &&
run_on_all cp ../.gitattributes ./folder1 &&
run_on_all cp a folder1/a &&
test_all_match git check-attr -a -- folder1/a &&
git -C full-checkout add folder1/.gitattributes &&
test_sparse_match git add --sparse folder1/.gitattributes &&
test_all_match git commit -m "add .gitattributes" &&
test_sparse_match git sparse-checkout reapply &&
test_all_match git check-attr -a --cached -- folder1/a
'
attr.c: read attributes in a sparse directory Before this patch, git check-attr was unable to read the attributes from a .gitattributes file within a sparse directory. The original comment was operating under the assumption that users are only interested in files or directories inside the cones. Therefore, in the original code, in the case of a cone-mode sparse-checkout, we didn't load the .gitattributes file. However, this behavior can lead to missing attributes for files inside sparse directories, causing inconsistencies in file handling. To resolve this, revise 'git check-attr' to allow attribute reading for files in sparse directories from the corresponding .gitattributes files: 1.Utilize path_in_cone_mode_sparse_checkout() and index_name_pos_sparse to check if a path falls within a sparse directory. 2.If path is inside a sparse directory, employ the value of index_name_pos_sparse() to find the sparse directory containing path and path relative to sparse directory. Proceed to read attributes from the tree OID of the sparse directory using read_attr_from_blob(). 3.If path is not inside a sparse directory,ensure that attributes are fetched from the index blob with read_blob_data_from_index(). Change the test 'check-attr with pathspec outside sparse definition' to 'test_expect_success' to reflect that the attributes inside a sparse directory can now be read. Ensure that the sparse index case works correctly for git check-attr to illustrate the successful handling of attributes within sparse directories. Helped-by: Victoria Dye <vdye@github.com> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-11 14:22:10 +00:00
# NEEDSWORK: The 'diff --check' test is left as 'test_expect_failure' due
# to an underlying issue in oneway_diff() within diff-lib.c.
# 'do_oneway_diff()' is not called as expected for paths that could match
# inside of a sparse directory. Specifically, the 'ce_path_match()' function
# fails to recognize files inside a sparse directory (e.g., when 'folder1/'
# is a sparse directory, 'folder1/a' cannot be recognized). The goal is to
# proceed with 'do_oneway_diff()' if the pathspec could match inside of a
# sparse directory.
t1092: add tests for 'git check-attr' Add tests for `git check-attr`, make sure attribute file does get read from index when path is either inside or outside of sparse-checkout definition. Add a test named 'diff --check with pathspec outside sparse definition'. It starts by disabling the trailing whitespace and space-before-tab checks using the core. whitespace configuration option. Then, it specifically re-enables the trailing whitespace check for a file located in a sparse directory by adding a whitespace=trailing-space rule to the .gitattributes file within that directory. Next, create and populate the folder1 directory, and then add the .gitattributes file to the index. Edit the contents of folder1/a, add it to the index, and proceed to "re-sparsify" 'folder1/' with 'git sparse-checkout reapply'. Finally, use 'git diff --check --cached' to compare the 'index vs. HEAD', ensuring the correct application of the attribute rules even when the file's path is outside the sparse-checkout definition. Mark the two tests 'check-attr with pathspec outside sparse definition' and 'diff --check with pathspec outside sparse definition' as 'test_expect_failure' to reflect an existing issue where the attributes inside a sparse directory are ignored. Ensure that the 'check-attr' command fails to read the required attributes to demonstrate this expected failure. Helped-by: Victoria Dye <vdye@github.com> Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-11 14:22:09 +00:00
test_expect_failure 'diff --check with pathspec outside sparse definition' '
init_repos &&
write_script edit-contents <<-\EOF &&
echo "a " >"$1"
EOF
test_all_match git config core.whitespace -trailing-space,-space-before-tab &&
echo "a whitespace=trailing-space,space-before-tab" >>.gitattributes &&
run_on_all mkdir -p folder1 &&
run_on_all cp ../.gitattributes ./folder1 &&
test_all_match git add --sparse folder1/.gitattributes &&
run_on_all ../edit-contents folder1/a &&
test_all_match git add --sparse folder1/a &&
test_sparse_match git sparse-checkout reapply &&
test_all_match test_must_fail git diff --check --cached -- folder1/a
'
test_expect_success 'sparse-index is not expanded: check-attr' '
init_repos &&
echo "a -crlf myAttr" >>.gitattributes &&
mkdir ./sparse-index/folder1 &&
cp ./sparse-index/a ./sparse-index/folder1/a &&
cp .gitattributes ./sparse-index/deep &&
cp .gitattributes ./sparse-index/folder1 &&
git -C sparse-index add deep/.gitattributes &&
git -C sparse-index add --sparse folder1/.gitattributes &&
ensure_not_expanded check-attr -a --cached -- deep/a &&
ensure_not_expanded check-attr -a --cached -- folder1/a
'
test_done