2008-05-10 04:01:55 +00:00
|
|
|
#!/bin/sh
|
|
|
|
|
2008-09-03 08:59:33 +00:00
|
|
|
test_description='git repack works correctly'
|
2008-05-10 04:01:55 +00:00
|
|
|
|
2020-11-18 23:44:40 +00:00
|
|
|
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
|
tests: mark tests relying on the current default for `init.defaultBranch`
In addition to the manual adjustment to let the `linux-gcc` CI job run
the test suite with `master` and then with `main`, this patch makes sure
that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts
that currently rely on the initial branch name being `master by default.
To determine which test scripts to mark up, the first step was to
force-set the default branch name to `master` in
- all test scripts that contain the keyword `master`,
- t4211, which expects `t/t4211/history.export` with a hard-coded ref to
initialize the default branch,
- t5560 because it sources `t/t556x_common` which uses `master`,
- t8002 and t8012 because both source `t/annotate-tests.sh` which also
uses `master`)
This trick was performed by this command:
$ sed -i '/^ *\. \.\/\(test-lib\|lib-\(bash\|cvs\|git-svn\)\|gitweb-lib\)\.sh$/i\
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\
' $(git grep -l master t/t[0-9]*.sh) \
t/t4211*.sh t/t5560*.sh t/t8002*.sh t/t8012*.sh
After that, careful, manual inspection revealed that some of the test
scripts containing the needle `master` do not actually rely on a
specific default branch name: either they mention `master` only in a
comment, or they initialize that branch specificially, or they do not
actually refer to the current default branch. Therefore, the
aforementioned modification was undone in those test scripts thusly:
$ git checkout HEAD -- \
t/t0027-auto-crlf.sh t/t0060-path-utils.sh \
t/t1011-read-tree-sparse-checkout.sh \
t/t1305-config-include.sh t/t1309-early-config.sh \
t/t1402-check-ref-format.sh t/t1450-fsck.sh \
t/t2024-checkout-dwim.sh \
t/t2106-update-index-assume-unchanged.sh \
t/t3040-subprojects-basic.sh t/t3301-notes.sh \
t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \
t/t3436-rebase-more-options.sh \
t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \
t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \
t/t5511-refspec.sh t/t5526-fetch-submodules.sh \
t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \
t/t5548-push-porcelain.sh \
t/t5552-skipping-fetch-negotiator.sh \
t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \
t/t5614-clone-submodules-shallow.sh \
t/t7508-status.sh t/t7606-merge-custom.sh \
t/t9302-fast-import-unpack-limit.sh
We excluded one set of test scripts in these commands, though: the range
of `git p4` tests. The reason? `git p4` stores the (foreign) remote
branch in the branch called `p4/master`, which is obviously not the
default branch. Manual analysis revealed that only five of these tests
actually require a specific default branch name to pass; They were
modified thusly:
$ sed -i '/^ *\. \.\/lib-git-p4\.sh$/i\
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\
' t/t980[0167]*.sh t/t9811*.sh
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-18 23:44:19 +00:00
|
|
|
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
|
|
|
|
|
tests: mark tests as passing with SANITIZE=leak
When the "ab/various-leak-fixes" topic was merged in [1] only t6021
would fail if the tests were run in the
"GIT_TEST_PASSING_SANITIZE_LEAK=check" mode, i.e. to check whether we
marked all leak-free tests with "TEST_PASSES_SANITIZE_LEAK=true".
Since then we've had various tests starting to pass under
SANITIZE=leak. Let's mark those as passing, this is when they started
to pass, narrowed down with "git bisect":
- t5317-pack-objects-filter-objects.sh: In
faebba436e6 (list-objects-filter: plug pattern_list leak, 2022-12-01).
- t3210-pack-refs.sh, t5613-info-alternate.sh,
t7403-submodule-sync.sh: In 189e97bc4ba (diff: remove parseopts member
from struct diff_options, 2022-12-01).
- t1408-packed-refs.sh: In ab91f6b7c42 (Merge branch
'rs/diff-parseopts', 2022-12-19).
- t0023-crlf-am.sh, t4152-am-subjects.sh, t4254-am-corrupt.sh,
t4256-am-format-flowed.sh, t4257-am-interactive.sh,
t5403-post-checkout-hook.sh: In a658e881c13 (am: don't pass strvec to
apply_parse_options(), 2022-12-13)
- t1301-shared-repo.sh, t1302-repo-version.sh: In b07a819c05f (reflog:
clear leftovers in reflog_expiry_cleanup(), 2022-12-13).
- t1304-default-acl.sh, t1410-reflog.sh,
t5330-no-lazy-fetch-with-commit-graph.sh, t5502-quickfetch.sh,
t5604-clone-reference.sh, t6014-rev-list-all.sh,
t7701-repack-unpack-unreachable.sh: In b0c61be3209 (Merge branch
'rs/reflog-expiry-cleanup', 2022-12-26)
- t3800-mktag.sh, t5302-pack-index.sh, t5306-pack-nobase.sh,
t5573-pull-verify-signatures.sh, t7612-merge-verify-signatures.sh: In
69bbbe484ba (hash-object: use fsck for object checks, 2023-01-18).
- t1451-fsck-buffer.sh: In 8e4309038f0 (fsck: do not assume
NUL-termination of buffers, 2023-01-19).
- t6501-freshen-objects.sh: In abf2bb895b4 (Merge branch
'jk/hash-object-fsck', 2023-01-30)
1. 9ea1378d046 (Merge branch 'ab/various-leak-fixes', 2022-12-14)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-06 23:07:36 +00:00
|
|
|
TEST_PASSES_SANITIZE_LEAK=true
|
2008-05-10 04:01:55 +00:00
|
|
|
. ./test-lib.sh
|
|
|
|
|
2008-06-29 00:25:05 +00:00
|
|
|
fsha1=
|
|
|
|
csha1=
|
|
|
|
tsha1=
|
|
|
|
|
2008-11-13 20:11:46 +00:00
|
|
|
test_expect_success '-A with -d option leaves unreachable objects unpacked' '
|
2008-05-10 04:01:55 +00:00
|
|
|
echo content > file1 &&
|
|
|
|
git add . &&
|
2010-04-14 22:09:57 +00:00
|
|
|
test_tick &&
|
2008-05-10 04:01:55 +00:00
|
|
|
git commit -m initial_commit &&
|
|
|
|
# create a transient branch with unique content
|
|
|
|
git checkout -b transient_branch &&
|
|
|
|
echo more content >> file1 &&
|
|
|
|
# record the objects created in the database for file, commit, tree
|
|
|
|
fsha1=$(git hash-object file1) &&
|
2010-04-14 22:09:57 +00:00
|
|
|
test_tick &&
|
2008-05-10 04:01:55 +00:00
|
|
|
git commit -a -m more_content &&
|
|
|
|
csha1=$(git rev-parse HEAD^{commit}) &&
|
|
|
|
tsha1=$(git rev-parse HEAD^{tree}) &&
|
2020-11-18 23:44:40 +00:00
|
|
|
git checkout main &&
|
2008-05-10 04:01:55 +00:00
|
|
|
echo even more content >> file1 &&
|
2010-04-14 22:09:57 +00:00
|
|
|
test_tick &&
|
2008-05-10 04:01:55 +00:00
|
|
|
git commit -a -m even_more_content &&
|
|
|
|
# delete the transient branch
|
|
|
|
git branch -D transient_branch &&
|
|
|
|
# pack the repo
|
|
|
|
git repack -A -d -l &&
|
|
|
|
# verify objects are packed in repository
|
|
|
|
test 3 = $(git verify-pack -v -- .git/objects/pack/*.idx |
|
2022-09-21 13:02:31 +00:00
|
|
|
grep -E "^($fsha1|$csha1|$tsha1) " |
|
2008-05-10 04:01:55 +00:00
|
|
|
sort | uniq | wc -l) &&
|
|
|
|
git show $fsha1 &&
|
|
|
|
git show $csha1 &&
|
|
|
|
git show $tsha1 &&
|
2010-04-14 22:09:57 +00:00
|
|
|
# now expire the reflog, while keeping reachable ones but expiring
|
|
|
|
# unreachables immediately
|
|
|
|
test_tick &&
|
|
|
|
sometimeago=$(( $test_tick - 10000 )) &&
|
|
|
|
git reflog expire --expire=$sometimeago --expire-unreachable=$test_tick --all &&
|
2008-05-10 04:01:55 +00:00
|
|
|
# and repack
|
|
|
|
git repack -A -d -l &&
|
|
|
|
# verify objects are retained unpacked
|
|
|
|
test 0 = $(git verify-pack -v -- .git/objects/pack/*.idx |
|
2022-09-21 13:02:31 +00:00
|
|
|
grep -E "^($fsha1|$csha1|$tsha1) " |
|
2008-05-10 04:01:55 +00:00
|
|
|
sort | uniq | wc -l) &&
|
|
|
|
git show $fsha1 &&
|
|
|
|
git show $csha1 &&
|
|
|
|
git show $tsha1
|
|
|
|
'
|
|
|
|
|
2008-06-29 00:25:05 +00:00
|
|
|
compare_mtimes ()
|
|
|
|
{
|
2018-04-06 22:19:47 +00:00
|
|
|
read tref &&
|
|
|
|
while read t; do
|
2015-03-25 05:29:10 +00:00
|
|
|
test "$tref" = "$t" || return 1
|
2009-01-28 09:52:26 +00:00
|
|
|
done
|
2008-06-29 00:25:05 +00:00
|
|
|
}
|
|
|
|
|
2008-11-13 20:11:46 +00:00
|
|
|
test_expect_success '-A without -d option leaves unreachable objects packed' '
|
2008-06-29 00:25:05 +00:00
|
|
|
fsha1path=$(echo "$fsha1" | sed -e "s|\(..\)|\1/|") &&
|
|
|
|
fsha1path=".git/objects/$fsha1path" &&
|
|
|
|
csha1path=$(echo "$csha1" | sed -e "s|\(..\)|\1/|") &&
|
|
|
|
csha1path=".git/objects/$csha1path" &&
|
|
|
|
tsha1path=$(echo "$tsha1" | sed -e "s|\(..\)|\1/|") &&
|
|
|
|
tsha1path=".git/objects/$tsha1path" &&
|
|
|
|
git branch transient_branch $csha1 &&
|
|
|
|
git repack -a -d -l &&
|
|
|
|
test ! -f "$fsha1path" &&
|
|
|
|
test ! -f "$csha1path" &&
|
|
|
|
test ! -f "$tsha1path" &&
|
|
|
|
test 1 = $(ls -1 .git/objects/pack/pack-*.pack | wc -l) &&
|
|
|
|
packfile=$(ls .git/objects/pack/pack-*.pack) &&
|
|
|
|
git branch -D transient_branch &&
|
2010-04-14 22:09:57 +00:00
|
|
|
test_tick &&
|
2008-06-29 00:25:05 +00:00
|
|
|
git repack -A -l &&
|
2008-11-13 20:11:46 +00:00
|
|
|
test ! -f "$fsha1path" &&
|
|
|
|
test ! -f "$csha1path" &&
|
|
|
|
test ! -f "$tsha1path" &&
|
|
|
|
git show $fsha1 &&
|
|
|
|
git show $csha1 &&
|
|
|
|
git show $tsha1
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'unpacked objects receive timestamp of pack file' '
|
|
|
|
tmppack=".git/objects/pack/tmp_pack" &&
|
|
|
|
ln "$packfile" "$tmppack" &&
|
|
|
|
git repack -A -l -d &&
|
2018-04-25 04:29:00 +00:00
|
|
|
test-tool chmtime --get "$tmppack" "$fsha1path" "$csha1path" "$tsha1path" \
|
2009-01-28 09:52:26 +00:00
|
|
|
> mtimes &&
|
|
|
|
compare_mtimes < mtimes
|
2008-06-29 00:25:05 +00:00
|
|
|
'
|
|
|
|
|
2012-04-07 10:30:09 +00:00
|
|
|
test_expect_success 'do not bother loosening old objects' '
|
|
|
|
obj1=$(echo one | git hash-object -w --stdin) &&
|
|
|
|
obj2=$(echo two | git hash-object -w --stdin) &&
|
|
|
|
pack1=$(echo $obj1 | git pack-objects .git/objects/pack/pack) &&
|
|
|
|
pack2=$(echo $obj2 | git pack-objects .git/objects/pack/pack) &&
|
|
|
|
git prune-packed &&
|
|
|
|
git cat-file -p $obj1 &&
|
|
|
|
git cat-file -p $obj2 &&
|
2018-03-24 07:44:31 +00:00
|
|
|
test-tool chmtime =-86400 .git/objects/pack/pack-$pack2.pack &&
|
2012-04-07 10:30:09 +00:00
|
|
|
git repack -A -d --unpack-unreachable=1.hour.ago &&
|
|
|
|
git cat-file -p $obj1 &&
|
|
|
|
test_must_fail git cat-file -p $obj2
|
|
|
|
'
|
|
|
|
|
gc: introduce `gc.recentObjectsHook`
This patch introduces a new multi-valued configuration option,
`gc.recentObjectsHook` as a means to mark certain objects as recent (and
thus exempt from garbage collection), regardless of their age.
When performing a garbage collection operation on a repository with
unreachable objects, Git makes its decision on what to do with those
object(s) based on how recent the objects are or not. Generally speaking,
unreachable-but-recent objects stay in the repository, and older objects
are discarded.
However, we have no convenient way to keep certain precious, unreachable
objects around in the repository, even if they have aged out and would
be pruned. Our options today consist of:
- Point references at the reachability tips of any objects you
consider precious, which may be undesirable or infeasible if there
are many such objects.
- Track them via the reflog, which may be undesirable since the
reflog's lifetime is limited to that of the reference it's tracking
(and callers may want to keep those unreachable objects around for
longer).
- Extend the grace period, which may keep around other objects that
the caller *does* want to discard.
- Manually modify the mtimes of objects you want to keep. If those
objects are already loose, this is easy enough to do (you can just
enumerate and `touch -m` each one).
But if they are packed, you will either end up modifying the mtimes
of *all* objects in that pack, or be forced to write out a loose
copy of that object, both of which may be undesirable. Even worse,
if they are in a cruft pack, that requires modifying its `*.mtimes`
file by hand, since there is no exposed plumbing for this.
- Force the caller to construct the pack of objects they want
to keep themselves, and then mark the pack as kept by adding a
".keep" file. This works, but is burdensome for the caller, and
having extra packs is awkward as you roll forward your cruft pack.
This patch introduces a new option to the above list via the
`gc.recentObjectsHook` configuration, which allows the caller to
specify a program (or set of programs) whose output is treated as a set
of objects to treat as recent, regardless of their true age.
The implementation is straightforward. Git enumerates recent objects via
`add_unseen_recent_objects_to_traversal()`, which enumerates loose and
packed objects, and eventually calls add_recent_object() on any objects
for which `want_recent_object()`'s conditions are met.
This patch modifies the recency condition from simply "is the mtime of
this object more recent than the cutoff?" to "[...] or, is this object
mentioned by at least one `gc.recentObjectsHook`?".
Depending on whether or not we are generating a cruft pack, this allows
the caller to do one of two things:
- If generating a cruft pack, the caller is able to retain additional
objects via the cruft pack, even if they would have otherwise been
pruned due to their age.
- If not generating a cruft pack, the caller is likewise able to
retain additional objects as loose.
A potential alternative here is to introduce a new mode to alter the
contents of the reachable pack instead of the cruft one. One could
imagine a new option to `pack-objects`, say `--extra-reachable-tips`
that does the same thing as above, adding the visited set of objects
along the traversal to the pack.
But this has the unfortunate side-effect of altering the reachability
closure of that pack. If parts of the unreachable object graph mentioned
by one or more of the "extra reachable tips" programs is not closed,
then the resulting pack won't be either. This makes it impossible in the
general case to write out reachability bitmaps for that pack, since
closure is a requirement there.
Instead, keep these unreachable objects in the cruft pack (or set of
unreachable, loose objects) instead, to ensure that we can continue to
have a pack containing just reachable objects, which is always safe to
write a bitmap over.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-07 22:58:17 +00:00
|
|
|
test_expect_success 'gc.recentObjectsHook' '
|
|
|
|
obj1=$(echo one | git hash-object -w --stdin) &&
|
|
|
|
obj2=$(echo two | git hash-object -w --stdin) &&
|
|
|
|
obj3=$(echo three | git hash-object -w --stdin) &&
|
|
|
|
pack1=$(echo $obj1 | git pack-objects .git/objects/pack/pack) &&
|
|
|
|
pack2=$(echo $obj2 | git pack-objects .git/objects/pack/pack) &&
|
|
|
|
pack3=$(echo $obj3 | git pack-objects .git/objects/pack/pack) &&
|
|
|
|
git prune-packed &&
|
|
|
|
|
|
|
|
git cat-file -p $obj1 &&
|
|
|
|
git cat-file -p $obj2 &&
|
|
|
|
git cat-file -p $obj3 &&
|
|
|
|
|
2023-06-24 14:33:47 +00:00
|
|
|
# make an unreachable annotated tag object to ensure we rescue objects
|
|
|
|
# which are reachable from non-pruned unreachable objects
|
|
|
|
obj2_tag="$(git mktag <<-EOF
|
|
|
|
object $obj2
|
|
|
|
type blob
|
|
|
|
tag obj2-tag
|
|
|
|
tagger T A Gger <tagger@example.com> 1234567890 -0000
|
|
|
|
EOF
|
|
|
|
)" &&
|
|
|
|
|
|
|
|
obj2_tag_pack="$(echo $obj2_tag | git pack-objects .git/objects/pack/pack)" &&
|
|
|
|
git prune-packed &&
|
gc: introduce `gc.recentObjectsHook`
This patch introduces a new multi-valued configuration option,
`gc.recentObjectsHook` as a means to mark certain objects as recent (and
thus exempt from garbage collection), regardless of their age.
When performing a garbage collection operation on a repository with
unreachable objects, Git makes its decision on what to do with those
object(s) based on how recent the objects are or not. Generally speaking,
unreachable-but-recent objects stay in the repository, and older objects
are discarded.
However, we have no convenient way to keep certain precious, unreachable
objects around in the repository, even if they have aged out and would
be pruned. Our options today consist of:
- Point references at the reachability tips of any objects you
consider precious, which may be undesirable or infeasible if there
are many such objects.
- Track them via the reflog, which may be undesirable since the
reflog's lifetime is limited to that of the reference it's tracking
(and callers may want to keep those unreachable objects around for
longer).
- Extend the grace period, which may keep around other objects that
the caller *does* want to discard.
- Manually modify the mtimes of objects you want to keep. If those
objects are already loose, this is easy enough to do (you can just
enumerate and `touch -m` each one).
But if they are packed, you will either end up modifying the mtimes
of *all* objects in that pack, or be forced to write out a loose
copy of that object, both of which may be undesirable. Even worse,
if they are in a cruft pack, that requires modifying its `*.mtimes`
file by hand, since there is no exposed plumbing for this.
- Force the caller to construct the pack of objects they want
to keep themselves, and then mark the pack as kept by adding a
".keep" file. This works, but is burdensome for the caller, and
having extra packs is awkward as you roll forward your cruft pack.
This patch introduces a new option to the above list via the
`gc.recentObjectsHook` configuration, which allows the caller to
specify a program (or set of programs) whose output is treated as a set
of objects to treat as recent, regardless of their true age.
The implementation is straightforward. Git enumerates recent objects via
`add_unseen_recent_objects_to_traversal()`, which enumerates loose and
packed objects, and eventually calls add_recent_object() on any objects
for which `want_recent_object()`'s conditions are met.
This patch modifies the recency condition from simply "is the mtime of
this object more recent than the cutoff?" to "[...] or, is this object
mentioned by at least one `gc.recentObjectsHook`?".
Depending on whether or not we are generating a cruft pack, this allows
the caller to do one of two things:
- If generating a cruft pack, the caller is able to retain additional
objects via the cruft pack, even if they would have otherwise been
pruned due to their age.
- If not generating a cruft pack, the caller is likewise able to
retain additional objects as loose.
A potential alternative here is to introduce a new mode to alter the
contents of the reachable pack instead of the cruft one. One could
imagine a new option to `pack-objects`, say `--extra-reachable-tips`
that does the same thing as above, adding the visited set of objects
along the traversal to the pack.
But this has the unfortunate side-effect of altering the reachability
closure of that pack. If parts of the unreachable object graph mentioned
by one or more of the "extra reachable tips" programs is not closed,
then the resulting pack won't be either. This makes it impossible in the
general case to write out reachability bitmaps for that pack, since
closure is a requirement there.
Instead, keep these unreachable objects in the cruft pack (or set of
unreachable, loose objects) instead, to ensure that we can continue to
have a pack containing just reachable objects, which is always safe to
write a bitmap over.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-07 22:58:17 +00:00
|
|
|
|
|
|
|
write_script precious-objects <<-EOF &&
|
|
|
|
echo $obj2_tag
|
|
|
|
EOF
|
|
|
|
git config gc.recentObjectsHook ./precious-objects &&
|
|
|
|
|
|
|
|
test-tool chmtime =-86400 .git/objects/pack/pack-$pack2.pack &&
|
|
|
|
test-tool chmtime =-86400 .git/objects/pack/pack-$pack3.pack &&
|
2023-06-24 14:33:47 +00:00
|
|
|
test-tool chmtime =-86400 .git/objects/pack/pack-$obj2_tag_pack.pack &&
|
gc: introduce `gc.recentObjectsHook`
This patch introduces a new multi-valued configuration option,
`gc.recentObjectsHook` as a means to mark certain objects as recent (and
thus exempt from garbage collection), regardless of their age.
When performing a garbage collection operation on a repository with
unreachable objects, Git makes its decision on what to do with those
object(s) based on how recent the objects are or not. Generally speaking,
unreachable-but-recent objects stay in the repository, and older objects
are discarded.
However, we have no convenient way to keep certain precious, unreachable
objects around in the repository, even if they have aged out and would
be pruned. Our options today consist of:
- Point references at the reachability tips of any objects you
consider precious, which may be undesirable or infeasible if there
are many such objects.
- Track them via the reflog, which may be undesirable since the
reflog's lifetime is limited to that of the reference it's tracking
(and callers may want to keep those unreachable objects around for
longer).
- Extend the grace period, which may keep around other objects that
the caller *does* want to discard.
- Manually modify the mtimes of objects you want to keep. If those
objects are already loose, this is easy enough to do (you can just
enumerate and `touch -m` each one).
But if they are packed, you will either end up modifying the mtimes
of *all* objects in that pack, or be forced to write out a loose
copy of that object, both of which may be undesirable. Even worse,
if they are in a cruft pack, that requires modifying its `*.mtimes`
file by hand, since there is no exposed plumbing for this.
- Force the caller to construct the pack of objects they want
to keep themselves, and then mark the pack as kept by adding a
".keep" file. This works, but is burdensome for the caller, and
having extra packs is awkward as you roll forward your cruft pack.
This patch introduces a new option to the above list via the
`gc.recentObjectsHook` configuration, which allows the caller to
specify a program (or set of programs) whose output is treated as a set
of objects to treat as recent, regardless of their true age.
The implementation is straightforward. Git enumerates recent objects via
`add_unseen_recent_objects_to_traversal()`, which enumerates loose and
packed objects, and eventually calls add_recent_object() on any objects
for which `want_recent_object()`'s conditions are met.
This patch modifies the recency condition from simply "is the mtime of
this object more recent than the cutoff?" to "[...] or, is this object
mentioned by at least one `gc.recentObjectsHook`?".
Depending on whether or not we are generating a cruft pack, this allows
the caller to do one of two things:
- If generating a cruft pack, the caller is able to retain additional
objects via the cruft pack, even if they would have otherwise been
pruned due to their age.
- If not generating a cruft pack, the caller is likewise able to
retain additional objects as loose.
A potential alternative here is to introduce a new mode to alter the
contents of the reachable pack instead of the cruft one. One could
imagine a new option to `pack-objects`, say `--extra-reachable-tips`
that does the same thing as above, adding the visited set of objects
along the traversal to the pack.
But this has the unfortunate side-effect of altering the reachability
closure of that pack. If parts of the unreachable object graph mentioned
by one or more of the "extra reachable tips" programs is not closed,
then the resulting pack won't be either. This makes it impossible in the
general case to write out reachability bitmaps for that pack, since
closure is a requirement there.
Instead, keep these unreachable objects in the cruft pack (or set of
unreachable, loose objects) instead, to ensure that we can continue to
have a pack containing just reachable objects, which is always safe to
write a bitmap over.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-07 22:58:17 +00:00
|
|
|
git repack -A -d --unpack-unreachable=1.hour.ago &&
|
|
|
|
|
|
|
|
git cat-file -p $obj1 &&
|
|
|
|
git cat-file -p $obj2 &&
|
|
|
|
git cat-file -p $obj2_tag &&
|
|
|
|
test_must_fail git cat-file -p $obj3
|
|
|
|
'
|
|
|
|
|
2014-10-17 00:44:49 +00:00
|
|
|
test_expect_success 'keep packed objects found only in index' '
|
|
|
|
echo my-unique-content >file &&
|
|
|
|
git add file &&
|
|
|
|
git commit -m "make it reachable" &&
|
|
|
|
git gc &&
|
|
|
|
git reset HEAD^ &&
|
|
|
|
git reflog expire --expire=now --all &&
|
|
|
|
git add file &&
|
2018-03-24 07:44:31 +00:00
|
|
|
test-tool chmtime =-86400 .git/objects/pack/* &&
|
2014-10-17 00:44:49 +00:00
|
|
|
git gc --prune=1.hour.ago &&
|
|
|
|
git cat-file blob :file
|
|
|
|
'
|
|
|
|
|
repack: add --keep-unreachable option
The usual way to do a full repack (and what is done by
git-gc) is to run "repack -Ad --unpack-unreachable=<when>",
which will loosen any unreachable objects newer than
"<when>", and drop any older ones.
This is a safer alternative to "repack -ad", because
"<when>" becomes a grace period during which we will not
drop any new objects that are about to be referenced.
However, it isn't perfectly safe. It's always possible that
a process is about to reference an old object. Even if that
process were to take care to update the timestamp on the
object, there is no atomicity with a simultaneously running
"repack" process.
So while unlikely, there is a small race wherein we may drop
an object that is in the process of being referenced. If you
do automated repacking on a large number of active
repositories, you may hit it eventually, and the result is a
corrupted repository.
It would be nice to fix that race in the long run, but it's
complicated. In the meantime, there is a much simpler
strategy for automated repository maintenance: do not drop
objects at all. We already have a "--keep-unreachable"
option in pack-objects; we just need to plumb it through
from git-repack.
Note that this _isn't_ plumbed through from git-gc, so at
this point it's strictly a tool for people doing their own
advanced repository maintenance strategy.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-13 04:36:28 +00:00
|
|
|
test_expect_success 'repack -k keeps unreachable packed objects' '
|
|
|
|
# create packed-but-unreachable object
|
|
|
|
sha1=$(echo unreachable-packed | git hash-object -w --stdin) &&
|
|
|
|
pack=$(echo $sha1 | git pack-objects .git/objects/pack/pack) &&
|
|
|
|
git prune-packed &&
|
|
|
|
|
|
|
|
# -k should keep it
|
|
|
|
git repack -adk &&
|
|
|
|
git cat-file -p $sha1 &&
|
|
|
|
|
|
|
|
# and double check that without -k it would have been removed
|
|
|
|
git repack -ad &&
|
|
|
|
test_must_fail git cat-file -p $sha1
|
|
|
|
'
|
|
|
|
|
repack: extend --keep-unreachable to loose objects
If you use "repack -adk" currently, we will pack all objects
that are already packed into the new pack, and then drop the
old packs. However, loose unreachable objects will be left
as-is. In theory these are meant to expire eventually with
"git prune". But if you are using "repack -k", you probably
want to keep things forever and therefore do not run "git
prune" at all. Meaning those loose objects may build up over
time and end up fooling any object-count heuristics (such as
the one done by "gc --auto", though since git-gc does not
support "repack -k", this really applies to whatever custom
scripts people might have driving "repack -k").
With this patch, we instead stuff any loose unreachable
objects into the pack along with the already-packed
unreachable objects. This may seem wasteful, but it is
really no more so than using "repack -k" in the first place.
We are at a slight disadvantage, in that we have no useful
ordering for the result, or names to hand to the delta code.
However, this is again no worse than what "repack -k" is
already doing for the packed objects. The packing of these
objects doesn't matter much because they should not be
accessed frequently (unless they actually _do_ become
referenced, but then they would get moved to a different
part of the packfile during the next repack).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-13 04:38:04 +00:00
|
|
|
test_expect_success 'repack -k packs unreachable loose objects' '
|
|
|
|
# create loose unreachable object
|
|
|
|
sha1=$(echo would-be-deleted-loose | git hash-object -w --stdin) &&
|
|
|
|
objpath=.git/objects/$(echo $sha1 | sed "s,..,&/,") &&
|
|
|
|
test_path_is_file $objpath &&
|
|
|
|
|
|
|
|
# and confirm that the loose object goes away, but we can
|
|
|
|
# still access it (ergo, it is packed)
|
|
|
|
git repack -adk &&
|
|
|
|
test_path_is_missing $objpath &&
|
|
|
|
git cat-file -p $sha1
|
|
|
|
'
|
|
|
|
|
2008-05-10 04:01:55 +00:00
|
|
|
test_done
|