mirror of
https://github.com/git/git
synced 2024-11-05 18:59:29 +00:00
37dc6d8104
Cruft packs are an alternative mechanism for storing a collection of unreachable objects whose mtimes are recent enough to avoid being pruned out of the repository. When cruft packs were first introduced back inb757353676
(builtin/pack-objects.c: --cruft without expiration, 2022-05-20) anda7d493833f
(builtin/pack-objects.c: --cruft with expiration, 2022-05-20), the recommended workflow consisted of: - Repacking periodically, either by packing anything loose in the repository (via `git repack -d`) or producing a geometric sequence of packs (via `git repack --geometric=<d> -d`). - Every so often, splitting the repository into two packs, one cruft to store the unreachable objects, and another non-cruft pack to store the reachable objects. Repositories may (out of band with the above) choose periodically to prune out some unreachable objects which have aged out of the grace period by generating a pack with `--cruft-expiration=<approxidate>`. This allowed repositories to maintain relatively few packs on average, and quarantine unreachable objects together in a cruft pack, avoiding the pitfalls of holding unreachable objects as loose while they age out (for more, see some of the details in3d89a8c118
(Documentation/technical: add cruft-packs.txt, 2022-05-20)). This all works, but can be costly from an I/O-perspective when frequently repacking a repository that has many unreachable objects. This problem is exacerbated when those unreachable objects are rarely (if every) pruned. Since there is at most one cruft pack in the above scheme, each time we update the cruft pack it must be rewritten from scratch. Because much of the pack is reused, this is a relatively inexpensive operation from a CPU-perspective, but is very costly in terms of I/O since we end up rewriting basically the same pack (plus any new unreachable objects that have entered the repository since the last time a cruft pack was generated). At the time, we decided against implementing more robust support for multiple cruft packs. This patch implements that support which we were lacking. Introduce a new option `--max-cruft-size` which allows repositories to accumulate cruft packs up to a given size, after which point a new generation of cruft packs can accumulate until it reaches the maximum size, and so on. To generate a new cruft pack, the process works like so: - Sort a list of any existing cruft packs in ascending order of pack size. - Starting from the beginning of the list, group cruft packs together while the accumulated size is smaller than the maximum specified pack size. - Combine the objects in these cruft packs together into a new cruft pack, along with any other unreachable objects which have since entered the repository. Once a cruft pack grows beyond the size specified via `--max-cruft-size` the pack is effectively frozen. This limits the I/O churn up to a quadratic function of the value specified by the `--max-cruft-size` option, instead of behaving quadratically in the number of total unreachable objects. When pruning unreachable objects, we bypass the new code paths which combine small cruft packs together, and instead start from scratch, passing in the appropriate `--max-pack-size` down to `pack-objects`, putting it in charge of keeping the resulting set of cruft packs sized correctly. This may seem like further I/O churn, but in practice it isn't so bad. We could prune old cruft packs for whom all or most objects are removed, and then generate a new cruft pack with just the remaining set of objects. But this additional complexity buys us relatively little, because most objects end up being pruned anyway, so the I/O churn is well contained. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
395 lines
11 KiB
Bash
Executable file
395 lines
11 KiB
Bash
Executable file
#!/bin/sh
|
|
|
|
test_description='basic git gc tests
|
|
'
|
|
|
|
. ./test-lib.sh
|
|
. "$TEST_DIRECTORY"/lib-terminal.sh
|
|
|
|
test_expect_success 'setup' '
|
|
# do not let the amount of physical memory affects gc
|
|
# behavior, make sure we always pack everything to one pack by
|
|
# default
|
|
git config gc.bigPackThreshold 2g &&
|
|
|
|
# These are simply values which, when hashed as a blob with a newline,
|
|
# produce a hash where the first byte is 0x17 in their respective
|
|
# algorithms.
|
|
test_oid_cache <<-EOF
|
|
obj1 sha1:263
|
|
obj1 sha256:34
|
|
|
|
obj2 sha1:410
|
|
obj2 sha256:174
|
|
|
|
obj3 sha1:523
|
|
obj3 sha256:313
|
|
|
|
obj4 sha1:790
|
|
obj4 sha256:481
|
|
EOF
|
|
'
|
|
|
|
test_expect_success 'gc empty repository' '
|
|
git gc
|
|
'
|
|
|
|
test_expect_success 'gc does not leave behind pid file' '
|
|
git gc &&
|
|
test_path_is_missing .git/gc.pid
|
|
'
|
|
|
|
test_expect_success 'gc --gobbledegook' '
|
|
test_expect_code 129 git gc --nonsense 2>err &&
|
|
test_i18ngrep "[Uu]sage: git gc" err
|
|
'
|
|
|
|
test_expect_success 'gc -h with invalid configuration' '
|
|
mkdir broken &&
|
|
(
|
|
cd broken &&
|
|
git init &&
|
|
echo "[gc] pruneexpire = CORRUPT" >>.git/config &&
|
|
test_expect_code 129 git gc -h >usage 2>&1
|
|
) &&
|
|
test_i18ngrep "[Uu]sage" broken/usage
|
|
'
|
|
|
|
test_expect_success 'gc is not aborted due to a stale symref' '
|
|
git init remote &&
|
|
(
|
|
cd remote &&
|
|
test_commit initial &&
|
|
git clone . ../client &&
|
|
git branch -m develop &&
|
|
cd ../client &&
|
|
git fetch --prune &&
|
|
git gc
|
|
)
|
|
'
|
|
|
|
test_expect_success 'gc --keep-largest-pack' '
|
|
test_create_repo keep-pack &&
|
|
(
|
|
cd keep-pack &&
|
|
test_commit one &&
|
|
test_commit two &&
|
|
test_commit three &&
|
|
git gc &&
|
|
( cd .git/objects/pack && ls *.pack ) >pack-list &&
|
|
test_line_count = 1 pack-list &&
|
|
cp pack-list base-pack-list &&
|
|
test_commit four &&
|
|
git repack -d &&
|
|
test_commit five &&
|
|
git repack -d &&
|
|
( cd .git/objects/pack && ls *.pack ) >pack-list &&
|
|
test_line_count = 3 pack-list &&
|
|
git gc --keep-largest-pack &&
|
|
( cd .git/objects/pack && ls *.pack ) >pack-list &&
|
|
test_line_count = 2 pack-list &&
|
|
awk "/^P /{print \$2}" <.git/objects/info/packs >pack-info &&
|
|
test_line_count = 2 pack-info &&
|
|
test_path_is_file .git/objects/pack/$(cat base-pack-list) &&
|
|
git fsck
|
|
)
|
|
'
|
|
|
|
test_expect_success 'pre-auto-gc hook can stop auto gc' '
|
|
cat >err.expect <<-\EOF &&
|
|
no gc for you
|
|
EOF
|
|
|
|
git init pre-auto-gc-hook &&
|
|
test_hook -C pre-auto-gc-hook pre-auto-gc <<-\EOF &&
|
|
echo >&2 no gc for you &&
|
|
exit 1
|
|
EOF
|
|
(
|
|
cd pre-auto-gc-hook &&
|
|
|
|
git config gc.auto 3 &&
|
|
git config gc.autoDetach false &&
|
|
|
|
# We need to create two object whose sha1s start with 17
|
|
# since this is what git gc counts. As it happens, these
|
|
# two blobs will do so.
|
|
test_commit "$(test_oid obj1)" &&
|
|
test_commit "$(test_oid obj2)" &&
|
|
|
|
git gc --auto >../out.actual 2>../err.actual
|
|
) &&
|
|
test_must_be_empty out.actual &&
|
|
test_cmp err.expect err.actual &&
|
|
|
|
cat >err.expect <<-\EOF &&
|
|
will gc for you
|
|
Auto packing the repository for optimum performance.
|
|
See "git help gc" for manual housekeeping.
|
|
EOF
|
|
|
|
test_hook -C pre-auto-gc-hook --clobber pre-auto-gc <<-\EOF &&
|
|
echo >&2 will gc for you &&
|
|
exit 0
|
|
EOF
|
|
|
|
git -C pre-auto-gc-hook gc --auto >out.actual 2>err.actual &&
|
|
|
|
test_must_be_empty out.actual &&
|
|
test_cmp err.expect err.actual
|
|
'
|
|
|
|
test_expect_success 'auto gc with too many loose objects does not attempt to create bitmaps' '
|
|
test_config gc.auto 3 &&
|
|
test_config gc.autodetach false &&
|
|
test_config pack.writebitmaps true &&
|
|
# We need to create two object whose sha1s start with 17
|
|
# since this is what git gc counts. As it happens, these
|
|
# two blobs will do so.
|
|
test_commit "$(test_oid obj1)" &&
|
|
test_commit "$(test_oid obj2)" &&
|
|
# Our first gc will create a pack; our second will create a second pack
|
|
git gc --auto &&
|
|
ls .git/objects/pack/pack-*.pack | sort >existing_packs &&
|
|
test_commit "$(test_oid obj3)" &&
|
|
test_commit "$(test_oid obj4)" &&
|
|
|
|
git gc --auto 2>err &&
|
|
test_i18ngrep ! "^warning:" err &&
|
|
ls .git/objects/pack/pack-*.pack | sort >post_packs &&
|
|
comm -1 -3 existing_packs post_packs >new &&
|
|
comm -2 -3 existing_packs post_packs >del &&
|
|
test_line_count = 0 del && # No packs are deleted
|
|
test_line_count = 1 new # There is one new pack
|
|
'
|
|
|
|
test_expect_success 'gc --no-quiet' '
|
|
GIT_PROGRESS_DELAY=0 git -c gc.writeCommitGraph=true gc --no-quiet >stdout 2>stderr &&
|
|
test_must_be_empty stdout &&
|
|
test_i18ngrep "Computing commit graph generation numbers" stderr
|
|
'
|
|
|
|
test_expect_success TTY 'with TTY: gc --no-quiet' '
|
|
test_terminal env GIT_PROGRESS_DELAY=0 \
|
|
git -c gc.writeCommitGraph=true gc --no-quiet >stdout 2>stderr &&
|
|
test_must_be_empty stdout &&
|
|
test_i18ngrep "Enumerating objects" stderr &&
|
|
test_i18ngrep "Computing commit graph generation numbers" stderr
|
|
'
|
|
|
|
test_expect_success 'gc --quiet' '
|
|
git -c gc.writeCommitGraph=true gc --quiet >stdout 2>stderr &&
|
|
test_must_be_empty stdout &&
|
|
test_must_be_empty stderr
|
|
'
|
|
|
|
test_expect_success 'gc.reflogExpire{Unreachable,}=never skips "expire" via "gc"' '
|
|
test_config gc.reflogExpire never &&
|
|
test_config gc.reflogExpireUnreachable never &&
|
|
|
|
GIT_TRACE=$(pwd)/trace.out git gc &&
|
|
|
|
# Check that git-pack-refs is run as a sanity check (done via
|
|
# gc_before_repack()) but that git-expire is not.
|
|
grep -E "^trace: (built-in|exec|run_command): git pack-refs --" trace.out &&
|
|
! grep -E "^trace: (built-in|exec|run_command): git reflog expire --" trace.out
|
|
'
|
|
|
|
test_expect_success 'one of gc.reflogExpire{Unreachable,}=never does not skip "expire" via "gc"' '
|
|
>trace.out &&
|
|
test_config gc.reflogExpire never &&
|
|
GIT_TRACE=$(pwd)/trace.out git gc &&
|
|
grep -E "^trace: (built-in|exec|run_command): git reflog expire --" trace.out
|
|
'
|
|
|
|
prepare_cruft_history () {
|
|
test_commit base &&
|
|
|
|
test_commit --no-tag foo &&
|
|
test_commit --no-tag bar &&
|
|
git reset HEAD^^
|
|
}
|
|
|
|
assert_no_cruft_packs () {
|
|
find .git/objects/pack -name "*.mtimes" >mtimes &&
|
|
test_must_be_empty mtimes
|
|
}
|
|
|
|
for argv in \
|
|
"gc" \
|
|
"-c gc.cruftPacks=true gc" \
|
|
"-c gc.cruftPacks=false gc --cruft"
|
|
do
|
|
test_expect_success "git $argv generates a cruft pack" '
|
|
test_when_finished "rm -fr repo" &&
|
|
git init repo &&
|
|
(
|
|
cd repo &&
|
|
|
|
prepare_cruft_history &&
|
|
git $argv &&
|
|
|
|
find .git/objects/pack -name "*.mtimes" >mtimes &&
|
|
sed -e 's/\.mtimes$/\.pack/g' mtimes >packs &&
|
|
|
|
test_file_not_empty packs &&
|
|
while read pack
|
|
do
|
|
test_path_is_file "$pack" || return 1
|
|
done <packs
|
|
)
|
|
'
|
|
done
|
|
|
|
for argv in \
|
|
"gc --no-cruft" \
|
|
"-c gc.cruftPacks=false gc" \
|
|
"-c gc.cruftPacks=true gc --no-cruft"
|
|
do
|
|
test_expect_success "git $argv does not generate a cruft pack" '
|
|
test_when_finished "rm -fr repo" &&
|
|
git init repo &&
|
|
(
|
|
cd repo &&
|
|
|
|
prepare_cruft_history &&
|
|
git $argv &&
|
|
|
|
assert_no_cruft_packs
|
|
)
|
|
'
|
|
done
|
|
|
|
test_expect_success '--keep-largest-pack ignores cruft packs' '
|
|
test_when_finished "rm -fr repo" &&
|
|
git init repo &&
|
|
(
|
|
cd repo &&
|
|
|
|
# Generate a pack for reachable objects (of which there
|
|
# are 3), and one for unreachable objects (of which
|
|
# there are 6).
|
|
prepare_cruft_history &&
|
|
git gc --cruft &&
|
|
|
|
mtimes="$(find .git/objects/pack -type f -name "pack-*.mtimes")" &&
|
|
sz="$(test_file_size "${mtimes%.mtimes}.pack")" &&
|
|
|
|
# Ensure that the cruft pack gets removed (due to
|
|
# `--prune=now`) despite it being the largest pack.
|
|
git -c gc.bigPackThreshold=$sz gc --cruft --prune=now &&
|
|
|
|
assert_no_cruft_packs
|
|
)
|
|
'
|
|
|
|
test_expect_success 'gc.bigPackThreshold ignores cruft packs' '
|
|
test_when_finished "rm -fr repo" &&
|
|
git init repo &&
|
|
(
|
|
cd repo &&
|
|
|
|
# Generate a pack for reachable objects (of which there
|
|
# are 3), and one for unreachable objects (of which
|
|
# there are 6).
|
|
prepare_cruft_history &&
|
|
git gc --cruft &&
|
|
|
|
# Ensure that the cruft pack gets removed (due to
|
|
# `--prune=now`) despite it being the largest pack.
|
|
git gc --cruft --prune=now --keep-largest-pack &&
|
|
|
|
assert_no_cruft_packs
|
|
)
|
|
'
|
|
|
|
cruft_max_size_opts="git repack -d -l --cruft --cruft-expiration=2.weeks.ago"
|
|
|
|
test_expect_success 'setup for --max-cruft-size tests' '
|
|
git init cruft--max-size &&
|
|
(
|
|
cd cruft--max-size &&
|
|
prepare_cruft_history
|
|
)
|
|
'
|
|
|
|
test_expect_success '--max-cruft-size sets appropriate repack options' '
|
|
GIT_TRACE2_EVENT=$(pwd)/trace2.txt git -C cruft--max-size \
|
|
gc --cruft --max-cruft-size=1M &&
|
|
test_subcommand $cruft_max_size_opts --max-cruft-size=1048576 <trace2.txt
|
|
'
|
|
|
|
test_expect_success 'gc.maxCruftSize sets appropriate repack options' '
|
|
GIT_TRACE2_EVENT=$(pwd)/trace2.txt \
|
|
git -C cruft--max-size -c gc.maxCruftSize=2M gc --cruft &&
|
|
test_subcommand $cruft_max_size_opts --max-cruft-size=2097152 <trace2.txt &&
|
|
|
|
GIT_TRACE2_EVENT=$(pwd)/trace2.txt \
|
|
git -C cruft--max-size -c gc.maxCruftSize=2M gc --cruft \
|
|
--max-cruft-size=3M &&
|
|
test_subcommand $cruft_max_size_opts --max-cruft-size=3145728 <trace2.txt
|
|
'
|
|
|
|
run_and_wait_for_auto_gc () {
|
|
# We read stdout from gc for the side effect of waiting until the
|
|
# background gc process exits, closing its fd 9. Furthermore, the
|
|
# variable assignment from a command substitution preserves the
|
|
# exit status of the main gc process.
|
|
# Note: this fd trickery doesn't work on Windows, but there is no
|
|
# need to, because on Win the auto gc always runs in the foreground.
|
|
doesnt_matter=$(git gc --auto 9>&1)
|
|
}
|
|
|
|
test_expect_success 'background auto gc does not run if gc.log is present and recent but does if it is old' '
|
|
test_commit foo &&
|
|
test_commit bar &&
|
|
git repack &&
|
|
test_config gc.autopacklimit 1 &&
|
|
test_config gc.autodetach true &&
|
|
echo fleem >.git/gc.log &&
|
|
git gc --auto 2>err &&
|
|
test_i18ngrep "^warning:" err &&
|
|
test_config gc.logexpiry 5.days &&
|
|
test-tool chmtime =-345600 .git/gc.log &&
|
|
git gc --auto &&
|
|
test_config gc.logexpiry 2.days &&
|
|
run_and_wait_for_auto_gc &&
|
|
ls .git/objects/pack/pack-*.pack >packs &&
|
|
test_line_count = 1 packs
|
|
'
|
|
|
|
test_expect_success 'background auto gc respects lock for all operations' '
|
|
# make sure we run a background auto-gc
|
|
test_commit make-pack &&
|
|
git repack &&
|
|
test_config gc.autopacklimit 1 &&
|
|
test_config gc.autodetach true &&
|
|
|
|
# create a ref whose loose presence we can use to detect a pack-refs run
|
|
git update-ref refs/heads/should-be-loose HEAD &&
|
|
(ls -1 .git/refs/heads .git/reftable >expect || true) &&
|
|
|
|
# now fake a concurrent gc that holds the lock; we can use our
|
|
# shell pid so that it looks valid.
|
|
hostname=$(hostname || echo unknown) &&
|
|
shell_pid=$$ &&
|
|
if test_have_prereq MINGW && test -f /proc/$shell_pid/winpid
|
|
then
|
|
# In Git for Windows, Bash (actually, the MSYS2 runtime) has a
|
|
# different idea of PIDs than git.exe (actually Windows). Use
|
|
# the Windows PID in this case.
|
|
shell_pid=$(cat /proc/$shell_pid/winpid)
|
|
fi &&
|
|
printf "%d %s" "$shell_pid" "$hostname" >.git/gc.pid &&
|
|
|
|
# our gc should exit zero without doing anything
|
|
run_and_wait_for_auto_gc &&
|
|
(ls -1 .git/refs/heads .git/reftable >actual || true) &&
|
|
test_cmp expect actual
|
|
'
|
|
|
|
# DO NOT leave a detached auto gc process running near the end of the
|
|
# test script: it can run long enough in the background to racily
|
|
# interfere with the cleanup in 'test_done'.
|
|
|
|
test_done
|