git/t/t5304-prune.sh

368 lines
11 KiB
Bash
Raw Normal View History

#!/bin/sh
#
# Copyright (c) 2008 Johannes E. Schindelin
#
test_description='prune'
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
tests: mark tests relying on the current default for `init.defaultBranch` In addition to the manual adjustment to let the `linux-gcc` CI job run the test suite with `master` and then with `main`, this patch makes sure that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts that currently rely on the initial branch name being `master by default. To determine which test scripts to mark up, the first step was to force-set the default branch name to `master` in - all test scripts that contain the keyword `master`, - t4211, which expects `t/t4211/history.export` with a hard-coded ref to initialize the default branch, - t5560 because it sources `t/t556x_common` which uses `master`, - t8002 and t8012 because both source `t/annotate-tests.sh` which also uses `master`) This trick was performed by this command: $ sed -i '/^ *\. \.\/\(test-lib\|lib-\(bash\|cvs\|git-svn\)\|gitweb-lib\)\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' $(git grep -l master t/t[0-9]*.sh) \ t/t4211*.sh t/t5560*.sh t/t8002*.sh t/t8012*.sh After that, careful, manual inspection revealed that some of the test scripts containing the needle `master` do not actually rely on a specific default branch name: either they mention `master` only in a comment, or they initialize that branch specificially, or they do not actually refer to the current default branch. Therefore, the aforementioned modification was undone in those test scripts thusly: $ git checkout HEAD -- \ t/t0027-auto-crlf.sh t/t0060-path-utils.sh \ t/t1011-read-tree-sparse-checkout.sh \ t/t1305-config-include.sh t/t1309-early-config.sh \ t/t1402-check-ref-format.sh t/t1450-fsck.sh \ t/t2024-checkout-dwim.sh \ t/t2106-update-index-assume-unchanged.sh \ t/t3040-subprojects-basic.sh t/t3301-notes.sh \ t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \ t/t3436-rebase-more-options.sh \ t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \ t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \ t/t5511-refspec.sh t/t5526-fetch-submodules.sh \ t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \ t/t5548-push-porcelain.sh \ t/t5552-skipping-fetch-negotiator.sh \ t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \ t/t5614-clone-submodules-shallow.sh \ t/t7508-status.sh t/t7606-merge-custom.sh \ t/t9302-fast-import-unpack-limit.sh We excluded one set of test scripts in these commands, though: the range of `git p4` tests. The reason? `git p4` stores the (foreign) remote branch in the branch called `p4/master`, which is obviously not the default branch. Manual analysis revealed that only five of these tests actually require a specific default branch name to pass; They were modified thusly: $ sed -i '/^ *\. \.\/lib-git-p4\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' t/t980[0167]*.sh t/t9811*.sh Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-18 23:44:19 +00:00
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
. ./test-lib.sh
day=$((60*60*24))
week=$(($day*7))
add_blob() {
before=$(git count-objects | sed "s/ .*//") &&
BLOB=$(echo aleph_0 | git hash-object -w --stdin) &&
BLOB_FILE=.git/objects/$(echo $BLOB | sed "s/^../&\//") &&
test $((1 + $before)) = $(git count-objects | sed "s/ .*//") &&
test_path_is_file $BLOB_FILE &&
test-tool chmtime =+0 $BLOB_FILE
}
test_expect_success setup '
>file &&
git add file &&
test_tick &&
git commit -m initial &&
git gc
'
test_expect_success 'bare repo prune is quiet without $GIT_DIR/objects/pack' '
git clone -q --shared --template= --bare . bare.git &&
rmdir bare.git/objects/pack &&
git --git-dir=bare.git prune --no-progress 2>prune.err &&
test_must_be_empty prune.err &&
rm -r bare.git prune.err
'
test_expect_success 'prune stale packs' '
orig_pack=$(echo .git/objects/pack/*.pack) &&
>.git/objects/tmp_1.pack &&
>.git/objects/tmp_2.pack &&
test-tool chmtime =-86501 .git/objects/tmp_1.pack &&
git prune --expire 1.day &&
test_path_is_file $orig_pack &&
test_path_is_file .git/objects/tmp_2.pack &&
test_path_is_missing .git/objects/tmp_1.pack
'
gc: call "prune --expire 2.weeks.ago" by default The only reason we did not call "prune" in git-gc was that it is an inherently dangerous operation: if there is a commit going on, you will prune loose objects that were just created, and are, in fact, needed by the commit object just about to be created. Since it is dangerous, we told users so. That led to many users not even daring to run it when it was actually safe. Besides, they are users, and should not have to remember such details as when to call git-gc with --prune, or to call git-prune directly. Of course, the consequence was that "git gc --auto" gets triggered much more often than we would like, since unreferenced loose objects (such as left-overs from a rebase or a reset --hard) were never pruned. Alas, git-prune recently learnt the option --expire <minimum-age>, which makes it a much safer operation. This allows us to call prune from git-gc, with a grace period of 2 weeks for the unreferenced loose objects (this value was determined in a discussion on the git list as a safe one). If you want to override this grace period, just set the config variable gc.pruneExpire to a different value; an example would be [gc] pruneExpire = 6.months.ago or even "never", if you feel really paranoid. Note that this new behaviour makes "--prune" be a no-op. While adding a test to t5304-prune.sh (since it really tests the implicit call to "prune"), also the original test for "prune --expire" was moved there from t1410-reflog.sh, where it did not belong. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2008-03-12 20:55:47 +00:00
test_expect_success 'prune --expire' '
add_blob &&
gc: call "prune --expire 2.weeks.ago" by default The only reason we did not call "prune" in git-gc was that it is an inherently dangerous operation: if there is a commit going on, you will prune loose objects that were just created, and are, in fact, needed by the commit object just about to be created. Since it is dangerous, we told users so. That led to many users not even daring to run it when it was actually safe. Besides, they are users, and should not have to remember such details as when to call git-gc with --prune, or to call git-prune directly. Of course, the consequence was that "git gc --auto" gets triggered much more often than we would like, since unreferenced loose objects (such as left-overs from a rebase or a reset --hard) were never pruned. Alas, git-prune recently learnt the option --expire <minimum-age>, which makes it a much safer operation. This allows us to call prune from git-gc, with a grace period of 2 weeks for the unreferenced loose objects (this value was determined in a discussion on the git list as a safe one). If you want to override this grace period, just set the config variable gc.pruneExpire to a different value; an example would be [gc] pruneExpire = 6.months.ago or even "never", if you feel really paranoid. Note that this new behaviour makes "--prune" be a no-op. While adding a test to t5304-prune.sh (since it really tests the implicit call to "prune"), also the original test for "prune --expire" was moved there from t1410-reflog.sh, where it did not belong. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2008-03-12 20:55:47 +00:00
git prune --expire=1.hour.ago &&
test $((1 + $before)) = $(git count-objects | sed "s/ .*//") &&
test_path_is_file $BLOB_FILE &&
test-tool chmtime =-86500 $BLOB_FILE &&
gc: call "prune --expire 2.weeks.ago" by default The only reason we did not call "prune" in git-gc was that it is an inherently dangerous operation: if there is a commit going on, you will prune loose objects that were just created, and are, in fact, needed by the commit object just about to be created. Since it is dangerous, we told users so. That led to many users not even daring to run it when it was actually safe. Besides, they are users, and should not have to remember such details as when to call git-gc with --prune, or to call git-prune directly. Of course, the consequence was that "git gc --auto" gets triggered much more often than we would like, since unreferenced loose objects (such as left-overs from a rebase or a reset --hard) were never pruned. Alas, git-prune recently learnt the option --expire <minimum-age>, which makes it a much safer operation. This allows us to call prune from git-gc, with a grace period of 2 weeks for the unreferenced loose objects (this value was determined in a discussion on the git list as a safe one). If you want to override this grace period, just set the config variable gc.pruneExpire to a different value; an example would be [gc] pruneExpire = 6.months.ago or even "never", if you feel really paranoid. Note that this new behaviour makes "--prune" be a no-op. While adding a test to t5304-prune.sh (since it really tests the implicit call to "prune"), also the original test for "prune --expire" was moved there from t1410-reflog.sh, where it did not belong. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2008-03-12 20:55:47 +00:00
git prune --expire 1.day &&
test $before = $(git count-objects | sed "s/ .*//") &&
test_path_is_missing $BLOB_FILE
gc: call "prune --expire 2.weeks.ago" by default The only reason we did not call "prune" in git-gc was that it is an inherently dangerous operation: if there is a commit going on, you will prune loose objects that were just created, and are, in fact, needed by the commit object just about to be created. Since it is dangerous, we told users so. That led to many users not even daring to run it when it was actually safe. Besides, they are users, and should not have to remember such details as when to call git-gc with --prune, or to call git-prune directly. Of course, the consequence was that "git gc --auto" gets triggered much more often than we would like, since unreferenced loose objects (such as left-overs from a rebase or a reset --hard) were never pruned. Alas, git-prune recently learnt the option --expire <minimum-age>, which makes it a much safer operation. This allows us to call prune from git-gc, with a grace period of 2 weeks for the unreferenced loose objects (this value was determined in a discussion on the git list as a safe one). If you want to override this grace period, just set the config variable gc.pruneExpire to a different value; an example would be [gc] pruneExpire = 6.months.ago or even "never", if you feel really paranoid. Note that this new behaviour makes "--prune" be a no-op. While adding a test to t5304-prune.sh (since it really tests the implicit call to "prune"), also the original test for "prune --expire" was moved there from t1410-reflog.sh, where it did not belong. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2008-03-12 20:55:47 +00:00
'
test_expect_success 'gc: implicit prune --expire' '
add_blob &&
test-tool chmtime =-$((2*$week-30)) $BLOB_FILE &&
git gc --no-cruft &&
test $((1 + $before)) = $(git count-objects | sed "s/ .*//") &&
test_path_is_file $BLOB_FILE &&
test-tool chmtime =-$((2*$week+1)) $BLOB_FILE &&
git gc --no-cruft &&
test $before = $(git count-objects | sed "s/ .*//") &&
test_path_is_missing $BLOB_FILE
gc: call "prune --expire 2.weeks.ago" by default The only reason we did not call "prune" in git-gc was that it is an inherently dangerous operation: if there is a commit going on, you will prune loose objects that were just created, and are, in fact, needed by the commit object just about to be created. Since it is dangerous, we told users so. That led to many users not even daring to run it when it was actually safe. Besides, they are users, and should not have to remember such details as when to call git-gc with --prune, or to call git-prune directly. Of course, the consequence was that "git gc --auto" gets triggered much more often than we would like, since unreferenced loose objects (such as left-overs from a rebase or a reset --hard) were never pruned. Alas, git-prune recently learnt the option --expire <minimum-age>, which makes it a much safer operation. This allows us to call prune from git-gc, with a grace period of 2 weeks for the unreferenced loose objects (this value was determined in a discussion on the git list as a safe one). If you want to override this grace period, just set the config variable gc.pruneExpire to a different value; an example would be [gc] pruneExpire = 6.months.ago or even "never", if you feel really paranoid. Note that this new behaviour makes "--prune" be a no-op. While adding a test to t5304-prune.sh (since it really tests the implicit call to "prune"), also the original test for "prune --expire" was moved there from t1410-reflog.sh, where it did not belong. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2008-03-12 20:55:47 +00:00
'
test_expect_success 'gc: refuse to start with invalid gc.pruneExpire' '
test_when_finished "rm -rf repo" &&
git init repo &&
>repo/.git/config &&
git -C repo config gc.pruneExpire invalid &&
cat >expect <<-\EOF &&
error: Invalid gc.pruneexpire: '\''invalid'\''
fatal: bad config variable '\''gc.pruneexpire'\'' in file '\''.git/config'\'' at line 2
EOF
test_must_fail git -C repo gc 2>actual &&
test_cmp expect actual
gc: call "prune --expire 2.weeks.ago" by default The only reason we did not call "prune" in git-gc was that it is an inherently dangerous operation: if there is a commit going on, you will prune loose objects that were just created, and are, in fact, needed by the commit object just about to be created. Since it is dangerous, we told users so. That led to many users not even daring to run it when it was actually safe. Besides, they are users, and should not have to remember such details as when to call git-gc with --prune, or to call git-prune directly. Of course, the consequence was that "git gc --auto" gets triggered much more often than we would like, since unreferenced loose objects (such as left-overs from a rebase or a reset --hard) were never pruned. Alas, git-prune recently learnt the option --expire <minimum-age>, which makes it a much safer operation. This allows us to call prune from git-gc, with a grace period of 2 weeks for the unreferenced loose objects (this value was determined in a discussion on the git list as a safe one). If you want to override this grace period, just set the config variable gc.pruneExpire to a different value; an example would be [gc] pruneExpire = 6.months.ago or even "never", if you feel really paranoid. Note that this new behaviour makes "--prune" be a no-op. While adding a test to t5304-prune.sh (since it really tests the implicit call to "prune"), also the original test for "prune --expire" was moved there from t1410-reflog.sh, where it did not belong. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2008-03-12 20:55:47 +00:00
'
test_expect_success 'gc: start with ok gc.pruneExpire' '
git config gc.pruneExpire 2.days.ago &&
git gc --no-cruft
gc: call "prune --expire 2.weeks.ago" by default The only reason we did not call "prune" in git-gc was that it is an inherently dangerous operation: if there is a commit going on, you will prune loose objects that were just created, and are, in fact, needed by the commit object just about to be created. Since it is dangerous, we told users so. That led to many users not even daring to run it when it was actually safe. Besides, they are users, and should not have to remember such details as when to call git-gc with --prune, or to call git-prune directly. Of course, the consequence was that "git gc --auto" gets triggered much more often than we would like, since unreferenced loose objects (such as left-overs from a rebase or a reset --hard) were never pruned. Alas, git-prune recently learnt the option --expire <minimum-age>, which makes it a much safer operation. This allows us to call prune from git-gc, with a grace period of 2 weeks for the unreferenced loose objects (this value was determined in a discussion on the git list as a safe one). If you want to override this grace period, just set the config variable gc.pruneExpire to a different value; an example would be [gc] pruneExpire = 6.months.ago or even "never", if you feel really paranoid. Note that this new behaviour makes "--prune" be a no-op. While adding a test to t5304-prune.sh (since it really tests the implicit call to "prune"), also the original test for "prune --expire" was moved there from t1410-reflog.sh, where it did not belong. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2008-03-12 20:55:47 +00:00
'
test_expect_success 'prune: prune nonsense parameters' '
test_must_fail git prune garbage &&
test_must_fail git prune --- &&
test_must_fail git prune --no-such-option
'
test_expect_success 'prune: prune unreachable heads' '
git config core.logAllRefUpdates false &&
>file2 &&
git add file2 &&
git commit -m temporary &&
tmp_head=$(git rev-list -1 HEAD) &&
git reset HEAD^ &&
git reflog expire --all &&
git prune &&
test_must_fail git reset $tmp_head --
'
test_expect_success 'prune: do not prune detached HEAD with no reflog' '
git checkout --detach --quiet &&
git commit --allow-empty -m "detached commit" &&
git reflog expire --all &&
git prune -n >prune_actual &&
tests: use 'test_must_be_empty' instead of 'test_cmp <empty> <out>' Using 'test_must_be_empty' is shorter and more idiomatic than >empty && test_cmp empty out as it saves the creation of an empty file. Furthermore, sometimes the expected empty file doesn't have such a descriptive name like 'empty', and its creation is far away from the place where it's finally used for comparison (e.g. in 't7600-merge.sh', where two expected empty files are created in the 'setup' test, but are used only about 500 lines later). These cases were found by instrumenting 'test_cmp' to error out the test script when it's used to compare empty files, and then converted manually. Note that even after this patch there still remain a lot of cases where we use 'test_cmp' to check empty files: - Sometimes the expected output is not hard-coded in the test, but 'test_cmp' is used to ensure that two similar git commands produce the same output, and that output happens to be empty, e.g. the test 'submodule update --merge - ignores --merge for new submodules' in 't7406-submodule-update.sh'. - Repetitive common tasks, including preparing the expected results and running 'test_cmp', are often extracted into a helper function, and some of this helper's callsites expect no output. - For the same reason as above, the whole 'test_expect_success' block is within a helper function, e.g. in 't3070-wildmatch.sh'. - Or 'test_cmp' is invoked in a loop, e.g. the test 'cvs update (-p)' in 't9400-git-cvsserver-server.sh'. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-19 21:57:25 +00:00
test_must_be_empty prune_actual
'
test_expect_success 'prune: prune former HEAD after checking out branch' '
head_oid=$(git rev-parse HEAD) &&
git checkout --quiet main &&
git reflog expire --all &&
git prune -v >prune_actual &&
grep "$head_oid" prune_actual
'
test_expect_success 'prune: do not prune heads listed as an argument' '
>file2 &&
git add file2 &&
git commit -m temporary &&
tmp_head=$(git rev-list -1 HEAD) &&
git reset HEAD^ &&
git prune -- $tmp_head &&
git reset $tmp_head --
'
test_expect_success 'gc --no-prune' '
add_blob &&
test-tool chmtime =-$((5001*$day)) $BLOB_FILE &&
git config gc.pruneExpire 2.days.ago &&
git gc --no-prune --no-cruft &&
test 1 = $(git count-objects | sed "s/ .*//") &&
test_path_is_file $BLOB_FILE
'
test_expect_success 'gc respects gc.pruneExpire' '
git config gc.pruneExpire 5002.days.ago &&
git gc --no-cruft &&
test_path_is_file $BLOB_FILE &&
git config gc.pruneExpire 5000.days.ago &&
git gc --no-cruft &&
test_path_is_missing $BLOB_FILE
'
test_expect_success 'gc --prune=<date>' '
add_blob &&
test-tool chmtime =-$((5001*$day)) $BLOB_FILE &&
git gc --prune=5002.days.ago --no-cruft &&
test_path_is_file $BLOB_FILE &&
git gc --prune=5000.days.ago --no-cruft &&
test_path_is_missing $BLOB_FILE
'
test_expect_success 'gc --prune=never' '
add_blob &&
git gc --prune=never --no-cruft &&
test_path_is_file $BLOB_FILE &&
git gc --prune=now --no-cruft &&
test_path_is_missing $BLOB_FILE
'
test_expect_success 'gc respects gc.pruneExpire=never' '
git config gc.pruneExpire never &&
add_blob &&
git gc --no-cruft &&
test_path_is_file $BLOB_FILE &&
git config gc.pruneExpire now &&
git gc --no-cruft &&
test_path_is_missing $BLOB_FILE
'
test_expect_success 'prune --expire=never' '
add_blob &&
git prune --expire=never &&
test_path_is_file $BLOB_FILE &&
git prune &&
test_path_is_missing $BLOB_FILE
'
test_expect_success 'gc: prune old objects after local clone' '
add_blob &&
test-tool chmtime =-$((2*$week+1)) $BLOB_FILE &&
git clone --no-hardlinks . aclone &&
(
cd aclone &&
test 1 = $(git count-objects | sed "s/ .*//") &&
test_path_is_file $BLOB_FILE &&
git gc --prune --no-cruft &&
test 0 = $(git count-objects | sed "s/ .*//") &&
test_path_is_missing $BLOB_FILE
)
'
test_expect_success 'garbage report in count-objects -v' '
test_when_finished "rm -f .git/objects/pack/fake*" &&
test_when_finished "rm -f .git/objects/pack/foo*" &&
>.git/objects/pack/foo &&
>.git/objects/pack/foo.bar &&
>.git/objects/pack/foo.keep &&
>.git/objects/pack/foo.pack &&
>.git/objects/pack/fake.bar &&
>.git/objects/pack/fake.keep &&
>.git/objects/pack/fake.pack &&
>.git/objects/pack/fake.idx &&
>.git/objects/pack/fake2.keep &&
>.git/objects/pack/fake3.idx &&
git count-objects -v 2>stderr &&
grep "index file .git/objects/pack/fake.idx is too small" stderr &&
grep "^warning:" stderr | sort >actual &&
cat >expected <<\EOF &&
warning: garbage found: .git/objects/pack/fake.bar
warning: garbage found: .git/objects/pack/foo
warning: garbage found: .git/objects/pack/foo.bar
warning: no corresponding .idx or .pack: .git/objects/pack/fake2.keep
warning: no corresponding .idx: .git/objects/pack/foo.keep
warning: no corresponding .idx: .git/objects/pack/foo.pack
warning: no corresponding .pack: .git/objects/pack/fake3.idx
EOF
test_cmp expected actual
'
test_expect_success 'clean pack garbage with gc' '
test_when_finished "rm -f .git/objects/pack/fake*" &&
test_when_finished "rm -f .git/objects/pack/foo*" &&
>.git/objects/pack/foo.keep &&
>.git/objects/pack/foo.pack &&
>.git/objects/pack/fake.idx &&
>.git/objects/pack/fake2.keep &&
>.git/objects/pack/fake2.idx &&
>.git/objects/pack/fake3.keep &&
git gc --no-cruft &&
git count-objects -v 2>stderr &&
grep "^warning:" stderr | sort >actual &&
cat >expected <<\EOF &&
warning: no corresponding .idx or .pack: .git/objects/pack/fake3.keep
warning: no corresponding .idx: .git/objects/pack/foo.keep
warning: no corresponding .idx: .git/objects/pack/foo.pack
EOF
test_cmp expected actual
'
test_expect_success 'prune .git/shallow' '
oid=$(echo hi|git commit-tree HEAD^{tree}) &&
echo $oid >.git/shallow &&
git prune --dry-run >out &&
grep $oid .git/shallow &&
grep $oid out &&
git prune &&
test_path_is_missing .git/shallow
'
prune: lazily perform reachability traversal The general strategy of "git prune" is to do a full reachability walk, then for each loose object see if we found it in our walk. But if we don't have any loose objects, we don't need to do the expensive walk in the first place. This patch postpones that walk until the first time we need to see its results. Note that this is really a specific case of a more general optimization, which is that we could traverse only far enough to find the object under consideration (i.e., stop the traversal when we find it, then pick up again when asked about the next object, etc). That could save us in some instances from having to do a full walk. But it's actually a bit tricky to do with our traversal code, and you'd need to do a full walk anyway if you have even a single unreachable object (which you generally do, if any objects are actually left after running git-repack). So in practice this lazy-load of the full walk catches one easy but common case (i.e., you've just repacked via git-gc, and there's nothing unreachable). The perf script is fairly contrived, but it does show off the improvement: Test HEAD^ HEAD ------------------------------------------------------------------------- 5304.4: prune with no objects 3.66(3.60+0.05) 0.00(0.00+0.00) -100.0% and would let us know if we accidentally regress this optimization. Note also that we need to take special care with prune_shallow(), which relies on us having performed the traversal. So this optimization can only kick in for a non-shallow repository. Since this is easy to get wrong and is not covered by existing tests, let's add an extra test to t5304 that covers this case explicitly. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-14 04:35:22 +00:00
test_expect_success 'prune .git/shallow when there are no loose objects' '
oid=$(echo hi|git commit-tree HEAD^{tree}) &&
echo $oid >.git/shallow &&
git update-ref refs/heads/shallow-tip $oid &&
prune: lazily perform reachability traversal The general strategy of "git prune" is to do a full reachability walk, then for each loose object see if we found it in our walk. But if we don't have any loose objects, we don't need to do the expensive walk in the first place. This patch postpones that walk until the first time we need to see its results. Note that this is really a specific case of a more general optimization, which is that we could traverse only far enough to find the object under consideration (i.e., stop the traversal when we find it, then pick up again when asked about the next object, etc). That could save us in some instances from having to do a full walk. But it's actually a bit tricky to do with our traversal code, and you'd need to do a full walk anyway if you have even a single unreachable object (which you generally do, if any objects are actually left after running git-repack). So in practice this lazy-load of the full walk catches one easy but common case (i.e., you've just repacked via git-gc, and there's nothing unreachable). The perf script is fairly contrived, but it does show off the improvement: Test HEAD^ HEAD ------------------------------------------------------------------------- 5304.4: prune with no objects 3.66(3.60+0.05) 0.00(0.00+0.00) -100.0% and would let us know if we accidentally regress this optimization. Note also that we need to take special care with prune_shallow(), which relies on us having performed the traversal. So this optimization can only kick in for a non-shallow repository. Since this is easy to get wrong and is not covered by existing tests, let's add an extra test to t5304 that covers this case explicitly. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-14 04:35:22 +00:00
git repack -ad &&
# verify assumption that all loose objects are gone
git count-objects | grep ^0 &&
git prune &&
echo $oid >expect &&
prune: lazily perform reachability traversal The general strategy of "git prune" is to do a full reachability walk, then for each loose object see if we found it in our walk. But if we don't have any loose objects, we don't need to do the expensive walk in the first place. This patch postpones that walk until the first time we need to see its results. Note that this is really a specific case of a more general optimization, which is that we could traverse only far enough to find the object under consideration (i.e., stop the traversal when we find it, then pick up again when asked about the next object, etc). That could save us in some instances from having to do a full walk. But it's actually a bit tricky to do with our traversal code, and you'd need to do a full walk anyway if you have even a single unreachable object (which you generally do, if any objects are actually left after running git-repack). So in practice this lazy-load of the full walk catches one easy but common case (i.e., you've just repacked via git-gc, and there's nothing unreachable). The perf script is fairly contrived, but it does show off the improvement: Test HEAD^ HEAD ------------------------------------------------------------------------- 5304.4: prune with no objects 3.66(3.60+0.05) 0.00(0.00+0.00) -100.0% and would let us know if we accidentally regress this optimization. Note also that we need to take special care with prune_shallow(), which relies on us having performed the traversal. So this optimization can only kick in for a non-shallow repository. Since this is easy to get wrong and is not covered by existing tests, let's add an extra test to t5304 that covers this case explicitly. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-14 04:35:22 +00:00
test_cmp expect .git/shallow
'
test_expect_success 'prune: handle alternate object database' '
test_create_repo A &&
git -C A commit --allow-empty -m "initial commit" &&
git clone --shared A B &&
git -C B commit --allow-empty -m "next commit" &&
git -C B prune
'
test_expect_success 'prune: handle index in multiple worktrees' '
git worktree add second-worktree &&
echo "new blob for second-worktree" >second-worktree/blob &&
git -C second-worktree add blob &&
git prune --expire=now &&
git -C second-worktree show :blob >actual &&
test_cmp second-worktree/blob actual
'
test_expect_success 'prune: handle HEAD in multiple worktrees' '
git worktree add --detach third-worktree &&
echo "new blob for third-worktree" >third-worktree/blob &&
git -C third-worktree add blob &&
git -C third-worktree commit -m "third" &&
rm .git/worktrees/third-worktree/index &&
test_must_fail git -C third-worktree show :blob &&
git prune --expire=now &&
git -C third-worktree show HEAD:blob >actual &&
test_cmp third-worktree/blob actual
'
test_expect_success 'prune: handle HEAD reflog in multiple worktrees' '
git config core.logAllRefUpdates true &&
echo "lost blob for third-worktree" >expected &&
(
cd third-worktree &&
cat ../expected >blob &&
git add blob &&
git commit -m "second commit in third" &&
git clean -f && # Remove untracked left behind by deleting index
git reset --hard HEAD^
) &&
git prune --expire=now &&
oid=`git hash-object expected` &&
git -C third-worktree show "$oid" >actual &&
test_cmp expected actual
'
2018-04-21 03:13:13 +00:00
test_expect_success 'prune: handle expire option correctly' '
test_must_fail git prune --expire 2>error &&
test_i18ngrep "requires a value" error &&
test_must_fail git prune --expire=nyah 2>error &&
test_i18ngrep "malformed expiration" error &&
git prune --no-expire
'
test_expect_success 'trivial prune with bitmaps enabled' '
git repack -adb &&
blob=$(echo bitmap-unreachable-blob | git hash-object -w --stdin) &&
git prune --expire=now &&
git cat-file -e HEAD &&
test_must_fail git cat-file -e $blob
'
prune: save reachable-from-recent objects with bitmaps We pass our prune expiration to mark_reachable_objects(), which will traverse not only the reachable objects, but consider any recent ones as tips for reachability; see d3038d22f9 (prune: keep objects reachable from recent objects, 2014-10-15) for details. However, this interacts badly with the bitmap code path added in fde67d6896 (prune: use bitmaps for reachability traversal, 2019-02-13). If we hit the bitmap-optimized path, we return immediately to avoid the regular traversal, accidentally skipping the "also traverse recent" code. Instead, we should do an if-else for the bitmap versus regular traversal, and then follow up with the "recent" traversal in either case. This reuses the "rev_info" for a bitmap and then a regular traversal, but that should work OK (the bitmap code clears the pending array in the usual way, just like a regular traversal would). Note that I dropped the comment above the regular traversal here. It has little explanatory value, and makes the if-else logic much harder to read. Here are a few variants that I rejected: - it seems like both the reachability and recent traversals could be done in a single traversal. This was rejected by d3038d22f9 (prune: keep objects reachable from recent objects, 2014-10-15), though the balance may be different when using bitmaps. However, there's a subtle correctness issue, too: we use revs->ignore_missing_links for the recent traversal, but not the reachability one. - we could try using bitmaps for the recent traversal, too, which could possibly improve performance. But it would require some fixes in the bitmap code, which uses ignore_missing_links for its own purposes. Plus it would probably not help all that much in practice. We use the reachable tips to generate bitmaps, so those objects are likely not covered by bitmaps (unless they just became unreachable). And in general, we expect the set of unreachable objects to be much smaller anyway, so there's less to gain. The test in t5304 detects the bug and confirms the fix. I also beefed up the tests in t6501, which covers the mtime-checking code more thoroughly, to handle the bitmap case (in addition to just "loose" and "packed" cases). Interestingly, this test doesn't actually detect the bug, because it is running "git gc", and not "prune" directly. And "gc" will call "repack" first, which does not suffer the same bug. So the old-but-reachable-from-recent objects get scooped up into the new pack along with the actually-recent objects, which gives both a recent mtime. But it seemed prudent to get more coverage of the bitmap case for related code. Reported-by: David Emett <dave@sp4m.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-28 15:42:43 +00:00
test_expect_success 'old reachable-from-recent retained with bitmaps' '
git repack -adb &&
to_drop=$(echo bitmap-from-recent-1 | git hash-object -w --stdin) &&
test-tool chmtime -86400 .git/objects/$(test_oid_to_path $to_drop) &&
to_save=$(echo bitmap-from-recent-2 | git hash-object -w --stdin) &&
test-tool chmtime -86400 .git/objects/$(test_oid_to_path $to_save) &&
tree=$(printf "100644 blob $to_save\tfile\n" | git mktree) &&
test-tool chmtime -86400 .git/objects/$(test_oid_to_path $tree) &&
commit=$(echo foo | git commit-tree $tree) &&
git prune --expire=12.hours.ago &&
git cat-file -e $commit &&
git cat-file -e $tree &&
git cat-file -e $to_save &&
test_must_fail git cat-file -e $to_drop
'
gc: introduce `gc.recentObjectsHook` This patch introduces a new multi-valued configuration option, `gc.recentObjectsHook` as a means to mark certain objects as recent (and thus exempt from garbage collection), regardless of their age. When performing a garbage collection operation on a repository with unreachable objects, Git makes its decision on what to do with those object(s) based on how recent the objects are or not. Generally speaking, unreachable-but-recent objects stay in the repository, and older objects are discarded. However, we have no convenient way to keep certain precious, unreachable objects around in the repository, even if they have aged out and would be pruned. Our options today consist of: - Point references at the reachability tips of any objects you consider precious, which may be undesirable or infeasible if there are many such objects. - Track them via the reflog, which may be undesirable since the reflog's lifetime is limited to that of the reference it's tracking (and callers may want to keep those unreachable objects around for longer). - Extend the grace period, which may keep around other objects that the caller *does* want to discard. - Manually modify the mtimes of objects you want to keep. If those objects are already loose, this is easy enough to do (you can just enumerate and `touch -m` each one). But if they are packed, you will either end up modifying the mtimes of *all* objects in that pack, or be forced to write out a loose copy of that object, both of which may be undesirable. Even worse, if they are in a cruft pack, that requires modifying its `*.mtimes` file by hand, since there is no exposed plumbing for this. - Force the caller to construct the pack of objects they want to keep themselves, and then mark the pack as kept by adding a ".keep" file. This works, but is burdensome for the caller, and having extra packs is awkward as you roll forward your cruft pack. This patch introduces a new option to the above list via the `gc.recentObjectsHook` configuration, which allows the caller to specify a program (or set of programs) whose output is treated as a set of objects to treat as recent, regardless of their true age. The implementation is straightforward. Git enumerates recent objects via `add_unseen_recent_objects_to_traversal()`, which enumerates loose and packed objects, and eventually calls add_recent_object() on any objects for which `want_recent_object()`'s conditions are met. This patch modifies the recency condition from simply "is the mtime of this object more recent than the cutoff?" to "[...] or, is this object mentioned by at least one `gc.recentObjectsHook`?". Depending on whether or not we are generating a cruft pack, this allows the caller to do one of two things: - If generating a cruft pack, the caller is able to retain additional objects via the cruft pack, even if they would have otherwise been pruned due to their age. - If not generating a cruft pack, the caller is likewise able to retain additional objects as loose. A potential alternative here is to introduce a new mode to alter the contents of the reachable pack instead of the cruft one. One could imagine a new option to `pack-objects`, say `--extra-reachable-tips` that does the same thing as above, adding the visited set of objects along the traversal to the pack. But this has the unfortunate side-effect of altering the reachability closure of that pack. If parts of the unreachable object graph mentioned by one or more of the "extra reachable tips" programs is not closed, then the resulting pack won't be either. This makes it impossible in the general case to write out reachability bitmaps for that pack, since closure is a requirement there. Instead, keep these unreachable objects in the cruft pack (or set of unreachable, loose objects) instead, to ensure that we can continue to have a pack containing just reachable objects, which is always safe to write a bitmap over. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-07 22:58:17 +00:00
test_expect_success 'gc.recentObjectsHook' '
add_blob &&
test-tool chmtime =-86500 $BLOB_FILE &&
write_script precious-objects <<-EOF &&
echo $BLOB
EOF
test_config gc.recentObjectsHook ./precious-objects &&
git prune --expire=now &&
git cat-file -p $BLOB
'
test_done