git/t/t7700-repack.sh

836 lines
25 KiB
Bash
Raw Normal View History

#!/bin/sh
test_description='git repack works correctly'
. ./test-lib.sh
builtin/repack.c: support writing a MIDX while repacking Teach `git repack` a new `--write-midx` option for callers that wish to persist a multi-pack index in their repository while repacking. There are two existing alternatives to this new flag, but they don't cover our particular use-case. These alternatives are: - Call 'git multi-pack-index write' after running 'git repack', or - Set 'GIT_TEST_MULTI_PACK_INDEX=1' in your environment when running 'git repack'. The former works, but introduces a gap in bitmap coverage between repacking and writing a new MIDX (since the repack may have deleted a pack included in the existing MIDX, invalidating it altogether). Setting the 'GIT_TEST_' environment variable is obviously unsupported. In fact, even if it were supported officially, it still wouldn't work, because it generates the MIDX *after* redundant packs have been dropped, leading to the same issue as above. Introduce a new option which eliminates this race by teaching `git repack` to generate the MIDX at the critical point: after the new packs have been written and moved into place, but before the redundant packs have been removed. This option is compatible with `git repack`'s '--bitmap' option (it changes the interpretation to be: "write a bitmap corresponding to the MIDX after one has been generated"). There is a little bit of additional noise in the patch below to avoid repeating ourselves when selecting which packs to delete. Instead of a single loop as before (where we iterate over 'existing_packs', decide if a pack is worth deleting, and if so, delete it), we have two loops (the first where we decide which ones are worth deleting, and the second where we actually do the deleting). This makes it so we have a single check we can make consistently when (1) telling the MIDX which packs we want to exclude, and (2) actually unlinking the redundant packs. There is also a tiny change to short-circuit the body of write_midx_included_packs() when no packs remain in the case of an empty repository. The MIDX code does not handle this, so avoid trying to generate a MIDX covering zero packs in the first place. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-29 01:55:18 +00:00
. "${TEST_DIRECTORY}/lib-bitmap.sh"
. "${TEST_DIRECTORY}/lib-midx.sh"
. "${TEST_DIRECTORY}/lib-terminal.sh"
commit_and_pack () {
test_commit "$@" 1>&2 &&
incrpackid=$(git pack-objects --all --unpacked --incremental .git/objects/pack/pack </dev/null) &&
builtin/repack.c: only collect fully-formed packs To partition the set of packs based on which ones are "kept" (either they have a .keep file, or were otherwise marked via the `--keep-pack` option) and "non-kept" ones (anything else), `git repack` uses its `collect_pack_filenames()` function. Ordinarily, we would rely on a convenience function such as `get_all_packs()` to enumerate and partition the set of packs. But `collect_pack_filenames()` uses `readdir()` directly to read the contents of the "$GIT_DIR/objects/pack" directory, and adds each entry ending in ".pack" to the appropriate list (either kept, or non-kept as above). This is subtly racy, since `collect_pack_filenames()` may see a pack that is not fully staged (i.e., it is missing its ".idx" file). Ordinarily, this doesn't cause a problem. But it can cause issues when generating a cruft pack. This is because `git repack` feeds (among other things) the list of existing kept packs down to `git pack-objects --cruft` to indicate that any kept packs will not be removed from the repository (so that the cruft pack machinery can avoid packing objects that appear in those packs as cruft). But `read_cruft_objects()` lists packfiles by calling `get_all_packs()`. So if a ".pack" file exists (necessary to get that pack to appear to `collect_pack_filenames()`), but doesn't have a corresponding ".idx" file (necessary to get that pack to appear via `get_all_packs()`), we'll complain with: fatal: could not find pack '.tmp-5841-pack-a6b0150558609c323c496ced21de6f4b66589260.pack' Fix the above by teaching `collect_pack_filenames()` to only collect packs with their corresponding `*.idx` files in place, indicating that those packs have been fully staged. There are a couple of things worth noting: - Since each entry in the `extra_keep` list (which contains the `--keep-pack` names) has a `*.pack` suffix, we'll have to swap the suffix from ".pack" to ".idx", and compare that instead. - Since we use the the `fname_kept_list` to figure out which packs to delete (with `git repack -d`), we would have previously deleted a `*.pack` with no index (since the existince of a ".pack" file is necessary and sufficient to include that pack in the list of existing non-kept packs). Now we will leave it alone (since that pack won't appear in the list). This is far more correct behavior, since we don't want to race with a pack being staged. Deleting a partially staged pack is unlikely, however, since the window of time between staging a pack and moving its .idx file into place is miniscule. Note that this window does *not* include the time it takes to receive and index the pack, since the incoming data goes into "$GIT_DIR/objects/tmp_pack_XXXXXX", which does not end in ".pack" and is thus ignored by collect_pack_filenames(). In the future, this function should probably be rewritten as a callback to `for_each_file_in_pack_dir()`, but this is the simplest change we could do in the short-term. Reported-by: Michael Haggerty <mhagger@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-07 10:16:17 +00:00
# Remove any loose object(s) created by test_commit, since they have
# already been packed. Leaving these around can create subtly different
# packs with `pack-objects`'s `--unpacked` option.
git prune-packed 1>&2 &&
echo pack-${incrpackid}.pack
}
test_no_missing_in_packs () {
myidx=$(ls -1 .git/objects/pack/*.idx) &&
test_path_is_file "$myidx" &&
git verify-pack -v alt_objects/pack/*.idx >orig.raw &&
sed -n -e "s/^\($OID_REGEX\).*/\1/p" orig.raw | sort >orig &&
git verify-pack -v $myidx >dest.raw &&
cut -d" " -f1 dest.raw | sort >dest &&
comm -23 orig dest >missing &&
test_must_be_empty missing
}
# we expect $packid and $oid to be defined
test_has_duplicate_object () {
want_duplicate_object="$1"
found_duplicate_object=false
for p in .git/objects/pack/*.idx
do
idx=$(basename $p)
test "pack-$packid.idx" = "$idx" && continue
git verify-pack -v $p >packlist || return $?
if grep "^$oid" packlist
then
found_duplicate_object=true
echo "DUPLICATE OBJECT FOUND"
break
fi
done &&
test "$want_duplicate_object" = "$found_duplicate_object"
}
test_expect_success 'objects in packs marked .keep are not repacked' '
echo content1 >file1 &&
echo content2 >file2 &&
git add . &&
test_tick &&
git commit -m initial_commit &&
# Create two packs
# The first pack will contain all of the objects except one
git rev-list --objects --all >objs &&
grep -v file2 objs | git pack-objects pack &&
# The second pack will contain the excluded object
packid=$(grep file2 objs | git pack-objects pack) &&
>pack-$packid.keep &&
git verify-pack -v pack-$packid.idx >packlist &&
oid=$(head -n 1 packlist | sed -e "s/^\($OID_REGEX\).*/\1/") &&
mv pack-* .git/objects/pack/ &&
git repack -A -d -l &&
git prune-packed &&
test_has_duplicate_object false
'
test_expect_success 'writing bitmaps via command-line can duplicate .keep objects' '
# build on $oid, $packid, and .keep state from previous
GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 git repack -Adbl &&
test_has_duplicate_object true
repack: add `repack.packKeptObjects` config var The git-repack command always passes `--honor-pack-keep` to pack-objects. This has traditionally been a good thing, as we do not want to duplicate those objects in a new pack, and we are not going to delete the old pack. However, when bitmaps are in use, it is important for a full repack to include all reachable objects, even if they may be duplicated in a .keep pack. Otherwise, we cannot generate the bitmaps, as the on-disk format requires the set of objects in the pack to be fully closed. Even if the repository does not generally have .keep files, a simultaneous push could cause a race condition in which a .keep file exists at the moment of a repack. The repack may try to include those objects in one of two situations: 1. The pushed .keep pack contains objects that were already in the repository (e.g., blobs due to a revert of an old commit). 2. Receive-pack updates the refs, making the objects reachable, but before it removes the .keep file, the repack runs. In either case, we may prefer to duplicate some objects in the new, full pack, and let the next repack (after the .keep file is cleaned up) take care of removing them. This patch introduces both a command-line and config option to disable the `--honor-pack-keep` option. By default, it is triggered when pack.writeBitmaps (or `--write-bitmap-index` is turned on), but specifying it explicitly can override the behavior (e.g., in cases where you prefer .keep files to bitmaps, but only when they are present). Note that this option just disables the pack-objects behavior. We still leave packs with a .keep in place, as we do not necessarily know that we have duplicated all of their objects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-03 20:04:20 +00:00
'
test_expect_success 'writing bitmaps via config can duplicate .keep objects' '
# build on $oid, $packid, and .keep state from previous
GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
git -c repack.writebitmaps=true repack -Adl &&
test_has_duplicate_object true
repack: add `repack.packKeptObjects` config var The git-repack command always passes `--honor-pack-keep` to pack-objects. This has traditionally been a good thing, as we do not want to duplicate those objects in a new pack, and we are not going to delete the old pack. However, when bitmaps are in use, it is important for a full repack to include all reachable objects, even if they may be duplicated in a .keep pack. Otherwise, we cannot generate the bitmaps, as the on-disk format requires the set of objects in the pack to be fully closed. Even if the repository does not generally have .keep files, a simultaneous push could cause a race condition in which a .keep file exists at the moment of a repack. The repack may try to include those objects in one of two situations: 1. The pushed .keep pack contains objects that were already in the repository (e.g., blobs due to a revert of an old commit). 2. Receive-pack updates the refs, making the objects reachable, but before it removes the .keep file, the repack runs. In either case, we may prefer to duplicate some objects in the new, full pack, and let the next repack (after the .keep file is cleaned up) take care of removing them. This patch introduces both a command-line and config option to disable the `--honor-pack-keep` option. By default, it is triggered when pack.writeBitmaps (or `--write-bitmap-index` is turned on), but specifying it explicitly can override the behavior (e.g., in cases where you prefer .keep files to bitmaps, but only when they are present). Note that this option just disables the pack-objects behavior. We still leave packs with a .keep in place, as we do not necessarily know that we have duplicated all of their objects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-03 20:04:20 +00:00
'
test_expect_success 'loose objects in alternate ODB are not repacked' '
mkdir alt_objects &&
echo $(pwd)/alt_objects >.git/objects/info/alternates &&
echo content3 >file3 &&
oid=$(GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w file3) &&
git add file3 &&
test_tick &&
git commit -m commit_file3 &&
git repack -a -d -l &&
git prune-packed &&
test_has_duplicate_object false
'
object-file: use real paths when adding alternates When adding an alternate ODB, we check if the alternate has the same path as the object dir, and if so, we do nothing. However, that comparison does not resolve symlinks. This makes it possible to add the object dir as an alternate, which may result in bad behavior. For example, it can trick "git repack -a -l -d" (possibly run by "git gc") into thinking that all packs come from an alternate and delete all objects. rm -rf test && git clone https://github.com/git/git test && ( cd test && ln -s objects .git/alt-objects && # -c repack.updateserverinfo=false silences a warning about not # being able to update "info/refs", it isn't needed to show the # bad behavior GIT_ALTERNATE_OBJECT_DIRECTORIES=".git/alt-objects" git \ -c repack.updateserverinfo=false repack -a -l -d && # It's broken! git status # Because there are no more objects! ls .git/objects/pack ) Fix this by resolving symlinks and relative paths before comparing the alternate and object dir. This lets us clean up a number of issues noted in 37a95862c6 (alternates: re-allow relative paths from environment, 2016-11-07): - Now that we compare the real paths, duplicate detection is no longer foiled by relative paths. - Using strbuf_realpath() allows us to "normalize" paths that strbuf_normalize_path() can't, so we can stop silently ignoring errors when "normalizing" paths from the environment. - We now store an absolute path based on getcwd() (the "future direction" named in 37a95862c6), so chdir()-ing in the process no longer changes the directory pointed to by the alternate. This is a change in behavior, but a desirable one. Signed-off-by: Glen Choo <chooglen@google.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-24 00:55:31 +00:00
test_expect_success SYMLINKS '--local keeps packs when alternate is objectdir ' '
test_when_finished "rm -rf repo" &&
git init repo &&
test_commit -C repo A &&
(
cd repo &&
git repack -a &&
ls .git/objects/pack/*.pack >../expect &&
ln -s objects .git/alt_objects &&
echo "$(pwd)/.git/alt_objects" >.git/objects/info/alternates &&
git repack -a -d -l &&
ls .git/objects/pack/*.pack >../actual
) &&
test_cmp expect actual
'
repack: disable writing bitmaps when doing a local repack In order to write a bitmap, we need to have full coverage of all objects that are about to be packed. In the traditional non-multi-pack-index world this meant we need to do a full repack of all objects into a single packfile. But in the new multi-pack-index world we can get away with writing bitmaps when we have multiple packfiles as long as the multi-pack-index covers all objects. This is not always the case though. When asked to perform a repack of local objects, only, then we cannot guarantee to have full coverage of all objects regardless of whether we do a full repack or a repack with a multi-pack-index. The end result is that writing the bitmap will fail in both worlds: $ git multi-pack-index write --stdin-packs --bitmap <packfiles warning: Failed to write bitmap index. Packfile doesn't have full closure (object 1529341d78cf45377407369acb0f4ff2b5cdae42 is missing) error: could not write multi-pack bitmap Now there are two different ways to fix this. The first one would be to amend git-multi-pack-index(1) to disable writing bitmaps when we notice that we don't have full object coverage. - We don't have enough information in git-multi-pack-index(1) in order to tell whether the local repository _should_ have full coverage. Because even when connected to an alternate object directory, it may be the case that we still have all objects around in the main object database. - git-multi-pack-index(1) is quite a low-level tool. Automatically disabling functionality that it was asked to provide does not feel like the right thing to do. We can easily fix it at a higher level in git-repack(1) though. When asked to only include local objects via `-l` and when connected to an alternate object directory then we will override the user's ask and disable writing bitmaps with a warning. This is similar to what we do in git-pack-objects(1), where we also disable writing bitmaps in case we omit an object from the pack. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 06:02:12 +00:00
test_expect_success '--local disables writing bitmaps when connected to alternate ODB' '
test_when_finished "rm -rf shared member" &&
git init shared &&
git clone --shared shared member &&
(
cd member &&
test_commit "object" &&
GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adl --write-bitmap-index 2>err &&
cat >expect <<-EOF &&
warning: disabling bitmap writing, as some objects are not being packed
EOF
test_cmp expect err &&
test_path_is_missing .git/objects/pack-*.bitmap
)
'
test_expect_success 'packed obs in alt ODB are repacked even when local repo is packless' '
mkdir alt_objects/pack &&
mv .git/objects/pack/* alt_objects/pack &&
git repack -a &&
test_no_missing_in_packs
'
test_expect_success 'packed obs in alt ODB are repacked when local repo has packs' '
rm -f .git/objects/pack/* &&
echo new_content >>file1 &&
git add file1 &&
test_tick &&
git commit -m more_content &&
git repack &&
git repack -a -d &&
test_no_missing_in_packs
'
test_expect_success 'packed obs in alternate ODB kept pack are repacked' '
# swap the .keep so the commit object is in the pack with .keep
for p in alt_objects/pack/*.pack
do
base_name=$(basename $p .pack) &&
if test_path_is_file alt_objects/pack/$base_name.keep
then
rm alt_objects/pack/$base_name.keep
else
touch alt_objects/pack/$base_name.keep
fi || return 1
done &&
git repack -a -d &&
test_no_missing_in_packs
'
test_expect_success 'packed unreachable obs in alternate ODB are not loosened' '
rm -f alt_objects/pack/*.keep &&
mv .git/objects/pack/* alt_objects/pack/ &&
coid=$(git rev-parse HEAD^{commit}) &&
git reset --hard HEAD^ &&
test_tick &&
git reflog expire --expire=$test_tick --expire-unreachable=$test_tick --all &&
# The pack-objects call on the next line is equivalent to
# git repack -A -d without the call to prune-packed
git pack-objects --honor-pack-keep --non-empty --all --reflog \
--unpack-unreachable </dev/null pack &&
rm -f .git/objects/pack/* &&
mv pack-* .git/objects/pack/ &&
git verify-pack -v -- .git/objects/pack/*.idx >packlist &&
! grep "^$coid " packlist &&
echo >.git/objects/info/alternates &&
test_must_fail git show $coid
'
test_expect_success 'local packed unreachable obs that exist in alternate ODB are not loosened' '
echo $(pwd)/alt_objects >.git/objects/info/alternates &&
echo "$coid" | git pack-objects --non-empty --all --reflog pack &&
rm -f .git/objects/pack/* &&
mv pack-* .git/objects/pack/ &&
# The pack-objects call on the next line is equivalent to
# git repack -A -d without the call to prune-packed
git pack-objects --honor-pack-keep --non-empty --all --reflog \
--unpack-unreachable </dev/null pack &&
rm -f .git/objects/pack/* &&
mv pack-* .git/objects/pack/ &&
git verify-pack -v -- .git/objects/pack/*.idx >packlist &&
! grep "^$coid " &&
echo >.git/objects/info/alternates &&
test_must_fail git show $coid
'
test_expect_success 'objects made unreachable by grafts only are kept' '
test_tick &&
git commit --allow-empty -m "commit 4" &&
H0=$(git rev-parse HEAD) &&
H1=$(git rev-parse HEAD^) &&
H2=$(git rev-parse HEAD^^) &&
echo "$H0 $H2" >.git/info/grafts &&
git reflog expire --expire=$test_tick --expire-unreachable=$test_tick --all &&
git repack -a -d &&
git cat-file -t $H1
'
test_expect_success 'repack --keep-pack' '
test_create_repo keep-pack &&
(
cd keep-pack &&
builtin/repack.c: only repack `.pack`s that exist In 73320e49add (builtin/repack.c: only collect fully-formed packs, 2023-06-07), we switched the check for which packs to collect by starting at the .idx files and looking for matching .pack files. This avoids trying to repack pack-files that have not had their pack-indexes installed yet. However, it does cause maintenance to halt if we find the (problematic, but not insurmountable) case of a .idx file without a corresponding .pack file. In an environment where packfile maintenance is a critical function, such a hard stop is costly and requires human intervention to resolve (by deleting the .idx file). This was not the case before. We successfully repacked through this scenario until the recent change to scan for .idx files. Further, if we are actually in a case where objects are missing, we detect this at a different point during the reachability walk. In other cases, Git prepares its list of packfiles by scanning .idx files and then only adds it to the packfile list if the corresponding .pack file exists. It even does so without a warning! (See add_packed_git() in packfile.c for details.) This case is much less likely to occur than the failures seen before 73320e49add. Packfiles are "installed" by writing the .pack file before the .idx and that process can be interrupted. Packfiles _should_ be deleted by deleting the .idx first, followed by the .pack file, but unlink_pack_path() does not do this: it deletes the .pack _first_, allowing a window where this process could be interrupted. We leave the consideration of changing this order as a separate concern. Knowing that this condition is possible from interrupted Git processes and not other tools lends some weight that Git should be more flexible around this scenario. Add a check to see if the .pack file exists before adding it to the list for repacking. This will stop a number of maintenance failures seen in production but fixed by deleting the .idx files. This brings us closer to the case before 73320e49add in that 'git repack' will not fail when there is an orphaned .idx file, at least, not due to the way we scan for packfiles. In the case that the .pack file was erroneously deleted without copies of its objects in other installed packfiles, then 'git repack' will fail due to the reachable object walk. This does resolve the case where automated repacks will no longer be halted on this case. The tests in t7700 show both these successful scenarios and the case of failing if the .pack was truly required. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-11 17:32:34 +00:00
# avoid producing different packs due to delta/base choices
builtin/repack.c: only collect fully-formed packs To partition the set of packs based on which ones are "kept" (either they have a .keep file, or were otherwise marked via the `--keep-pack` option) and "non-kept" ones (anything else), `git repack` uses its `collect_pack_filenames()` function. Ordinarily, we would rely on a convenience function such as `get_all_packs()` to enumerate and partition the set of packs. But `collect_pack_filenames()` uses `readdir()` directly to read the contents of the "$GIT_DIR/objects/pack" directory, and adds each entry ending in ".pack" to the appropriate list (either kept, or non-kept as above). This is subtly racy, since `collect_pack_filenames()` may see a pack that is not fully staged (i.e., it is missing its ".idx" file). Ordinarily, this doesn't cause a problem. But it can cause issues when generating a cruft pack. This is because `git repack` feeds (among other things) the list of existing kept packs down to `git pack-objects --cruft` to indicate that any kept packs will not be removed from the repository (so that the cruft pack machinery can avoid packing objects that appear in those packs as cruft). But `read_cruft_objects()` lists packfiles by calling `get_all_packs()`. So if a ".pack" file exists (necessary to get that pack to appear to `collect_pack_filenames()`), but doesn't have a corresponding ".idx" file (necessary to get that pack to appear via `get_all_packs()`), we'll complain with: fatal: could not find pack '.tmp-5841-pack-a6b0150558609c323c496ced21de6f4b66589260.pack' Fix the above by teaching `collect_pack_filenames()` to only collect packs with their corresponding `*.idx` files in place, indicating that those packs have been fully staged. There are a couple of things worth noting: - Since each entry in the `extra_keep` list (which contains the `--keep-pack` names) has a `*.pack` suffix, we'll have to swap the suffix from ".pack" to ".idx", and compare that instead. - Since we use the the `fname_kept_list` to figure out which packs to delete (with `git repack -d`), we would have previously deleted a `*.pack` with no index (since the existince of a ".pack" file is necessary and sufficient to include that pack in the list of existing non-kept packs). Now we will leave it alone (since that pack won't appear in the list). This is far more correct behavior, since we don't want to race with a pack being staged. Deleting a partially staged pack is unlikely, however, since the window of time between staging a pack and moving its .idx file into place is miniscule. Note that this window does *not* include the time it takes to receive and index the pack, since the incoming data goes into "$GIT_DIR/objects/tmp_pack_XXXXXX", which does not end in ".pack" and is thus ignored by collect_pack_filenames(). In the future, this function should probably be rewritten as a callback to `for_each_file_in_pack_dir()`, but this is the simplest change we could do in the short-term. Reported-by: Michael Haggerty <mhagger@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-07 10:16:17 +00:00
git config pack.window 0 &&
P1=$(commit_and_pack 1) &&
P2=$(commit_and_pack 2) &&
P3=$(commit_and_pack 3) &&
P4=$(commit_and_pack 4) &&
ls .git/objects/pack/*.pack >old-counts &&
test_line_count = 4 old-counts &&
git repack -a -d --keep-pack $P1 --keep-pack $P4 &&
ls .git/objects/pack/*.pack >new-counts &&
grep -q $P1 new-counts &&
grep -q $P4 new-counts &&
test_line_count = 3 new-counts &&
builtin/repack.c: only collect fully-formed packs To partition the set of packs based on which ones are "kept" (either they have a .keep file, or were otherwise marked via the `--keep-pack` option) and "non-kept" ones (anything else), `git repack` uses its `collect_pack_filenames()` function. Ordinarily, we would rely on a convenience function such as `get_all_packs()` to enumerate and partition the set of packs. But `collect_pack_filenames()` uses `readdir()` directly to read the contents of the "$GIT_DIR/objects/pack" directory, and adds each entry ending in ".pack" to the appropriate list (either kept, or non-kept as above). This is subtly racy, since `collect_pack_filenames()` may see a pack that is not fully staged (i.e., it is missing its ".idx" file). Ordinarily, this doesn't cause a problem. But it can cause issues when generating a cruft pack. This is because `git repack` feeds (among other things) the list of existing kept packs down to `git pack-objects --cruft` to indicate that any kept packs will not be removed from the repository (so that the cruft pack machinery can avoid packing objects that appear in those packs as cruft). But `read_cruft_objects()` lists packfiles by calling `get_all_packs()`. So if a ".pack" file exists (necessary to get that pack to appear to `collect_pack_filenames()`), but doesn't have a corresponding ".idx" file (necessary to get that pack to appear via `get_all_packs()`), we'll complain with: fatal: could not find pack '.tmp-5841-pack-a6b0150558609c323c496ced21de6f4b66589260.pack' Fix the above by teaching `collect_pack_filenames()` to only collect packs with their corresponding `*.idx` files in place, indicating that those packs have been fully staged. There are a couple of things worth noting: - Since each entry in the `extra_keep` list (which contains the `--keep-pack` names) has a `*.pack` suffix, we'll have to swap the suffix from ".pack" to ".idx", and compare that instead. - Since we use the the `fname_kept_list` to figure out which packs to delete (with `git repack -d`), we would have previously deleted a `*.pack` with no index (since the existince of a ".pack" file is necessary and sufficient to include that pack in the list of existing non-kept packs). Now we will leave it alone (since that pack won't appear in the list). This is far more correct behavior, since we don't want to race with a pack being staged. Deleting a partially staged pack is unlikely, however, since the window of time between staging a pack and moving its .idx file into place is miniscule. Note that this window does *not* include the time it takes to receive and index the pack, since the incoming data goes into "$GIT_DIR/objects/tmp_pack_XXXXXX", which does not end in ".pack" and is thus ignored by collect_pack_filenames(). In the future, this function should probably be rewritten as a callback to `for_each_file_in_pack_dir()`, but this is the simplest change we could do in the short-term. Reported-by: Michael Haggerty <mhagger@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-07 10:16:17 +00:00
git fsck &&
P5=$(commit_and_pack --no-tag 5) &&
git reset --hard HEAD^ &&
git reflog expire --all --expire=all &&
rm -f ".git/objects/pack/${P5%.pack}.idx" &&
rm -f ".git/objects/info/commit-graph" &&
for from in $(find .git/objects/pack -type f -name "${P5%.pack}.*")
do
to="$(dirname "$from")/.tmp-1234-$(basename "$from")" &&
mv "$from" "$to" || return 1
done &&
builtin/repack.c: only repack `.pack`s that exist In 73320e49add (builtin/repack.c: only collect fully-formed packs, 2023-06-07), we switched the check for which packs to collect by starting at the .idx files and looking for matching .pack files. This avoids trying to repack pack-files that have not had their pack-indexes installed yet. However, it does cause maintenance to halt if we find the (problematic, but not insurmountable) case of a .idx file without a corresponding .pack file. In an environment where packfile maintenance is a critical function, such a hard stop is costly and requires human intervention to resolve (by deleting the .idx file). This was not the case before. We successfully repacked through this scenario until the recent change to scan for .idx files. Further, if we are actually in a case where objects are missing, we detect this at a different point during the reachability walk. In other cases, Git prepares its list of packfiles by scanning .idx files and then only adds it to the packfile list if the corresponding .pack file exists. It even does so without a warning! (See add_packed_git() in packfile.c for details.) This case is much less likely to occur than the failures seen before 73320e49add. Packfiles are "installed" by writing the .pack file before the .idx and that process can be interrupted. Packfiles _should_ be deleted by deleting the .idx first, followed by the .pack file, but unlink_pack_path() does not do this: it deletes the .pack _first_, allowing a window where this process could be interrupted. We leave the consideration of changing this order as a separate concern. Knowing that this condition is possible from interrupted Git processes and not other tools lends some weight that Git should be more flexible around this scenario. Add a check to see if the .pack file exists before adding it to the list for repacking. This will stop a number of maintenance failures seen in production but fixed by deleting the .idx files. This brings us closer to the case before 73320e49add in that 'git repack' will not fail when there is an orphaned .idx file, at least, not due to the way we scan for packfiles. In the case that the .pack file was erroneously deleted without copies of its objects in other installed packfiles, then 'git repack' will fail due to the reachable object walk. This does resolve the case where automated repacks will no longer be halted on this case. The tests in t7700 show both these successful scenarios and the case of failing if the .pack was truly required. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-11 17:32:34 +00:00
# A .idx file without a .pack should not stop us from
# repacking what we can.
touch .git/objects/pack/pack-does-not-exist.idx &&
builtin/repack.c: only collect fully-formed packs To partition the set of packs based on which ones are "kept" (either they have a .keep file, or were otherwise marked via the `--keep-pack` option) and "non-kept" ones (anything else), `git repack` uses its `collect_pack_filenames()` function. Ordinarily, we would rely on a convenience function such as `get_all_packs()` to enumerate and partition the set of packs. But `collect_pack_filenames()` uses `readdir()` directly to read the contents of the "$GIT_DIR/objects/pack" directory, and adds each entry ending in ".pack" to the appropriate list (either kept, or non-kept as above). This is subtly racy, since `collect_pack_filenames()` may see a pack that is not fully staged (i.e., it is missing its ".idx" file). Ordinarily, this doesn't cause a problem. But it can cause issues when generating a cruft pack. This is because `git repack` feeds (among other things) the list of existing kept packs down to `git pack-objects --cruft` to indicate that any kept packs will not be removed from the repository (so that the cruft pack machinery can avoid packing objects that appear in those packs as cruft). But `read_cruft_objects()` lists packfiles by calling `get_all_packs()`. So if a ".pack" file exists (necessary to get that pack to appear to `collect_pack_filenames()`), but doesn't have a corresponding ".idx" file (necessary to get that pack to appear via `get_all_packs()`), we'll complain with: fatal: could not find pack '.tmp-5841-pack-a6b0150558609c323c496ced21de6f4b66589260.pack' Fix the above by teaching `collect_pack_filenames()` to only collect packs with their corresponding `*.idx` files in place, indicating that those packs have been fully staged. There are a couple of things worth noting: - Since each entry in the `extra_keep` list (which contains the `--keep-pack` names) has a `*.pack` suffix, we'll have to swap the suffix from ".pack" to ".idx", and compare that instead. - Since we use the the `fname_kept_list` to figure out which packs to delete (with `git repack -d`), we would have previously deleted a `*.pack` with no index (since the existince of a ".pack" file is necessary and sufficient to include that pack in the list of existing non-kept packs). Now we will leave it alone (since that pack won't appear in the list). This is far more correct behavior, since we don't want to race with a pack being staged. Deleting a partially staged pack is unlikely, however, since the window of time between staging a pack and moving its .idx file into place is miniscule. Note that this window does *not* include the time it takes to receive and index the pack, since the incoming data goes into "$GIT_DIR/objects/tmp_pack_XXXXXX", which does not end in ".pack" and is thus ignored by collect_pack_filenames(). In the future, this function should probably be rewritten as a callback to `for_each_file_in_pack_dir()`, but this is the simplest change we could do in the short-term. Reported-by: Michael Haggerty <mhagger@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-07 10:16:17 +00:00
git repack --cruft -d --keep-pack $P1 --keep-pack $P4 &&
ls .git/objects/pack/*.pack >newer-counts &&
test_cmp new-counts newer-counts &&
git fsck
)
'
builtin/repack.c: only repack `.pack`s that exist In 73320e49add (builtin/repack.c: only collect fully-formed packs, 2023-06-07), we switched the check for which packs to collect by starting at the .idx files and looking for matching .pack files. This avoids trying to repack pack-files that have not had their pack-indexes installed yet. However, it does cause maintenance to halt if we find the (problematic, but not insurmountable) case of a .idx file without a corresponding .pack file. In an environment where packfile maintenance is a critical function, such a hard stop is costly and requires human intervention to resolve (by deleting the .idx file). This was not the case before. We successfully repacked through this scenario until the recent change to scan for .idx files. Further, if we are actually in a case where objects are missing, we detect this at a different point during the reachability walk. In other cases, Git prepares its list of packfiles by scanning .idx files and then only adds it to the packfile list if the corresponding .pack file exists. It even does so without a warning! (See add_packed_git() in packfile.c for details.) This case is much less likely to occur than the failures seen before 73320e49add. Packfiles are "installed" by writing the .pack file before the .idx and that process can be interrupted. Packfiles _should_ be deleted by deleting the .idx first, followed by the .pack file, but unlink_pack_path() does not do this: it deletes the .pack _first_, allowing a window where this process could be interrupted. We leave the consideration of changing this order as a separate concern. Knowing that this condition is possible from interrupted Git processes and not other tools lends some weight that Git should be more flexible around this scenario. Add a check to see if the .pack file exists before adding it to the list for repacking. This will stop a number of maintenance failures seen in production but fixed by deleting the .idx files. This brings us closer to the case before 73320e49add in that 'git repack' will not fail when there is an orphaned .idx file, at least, not due to the way we scan for packfiles. In the case that the .pack file was erroneously deleted without copies of its objects in other installed packfiles, then 'git repack' will fail due to the reachable object walk. This does resolve the case where automated repacks will no longer be halted on this case. The tests in t7700 show both these successful scenarios and the case of failing if the .pack was truly required. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-11 17:32:34 +00:00
test_expect_success 'repacking fails when missing .pack actually means missing objects' '
test_create_repo idx-without-pack &&
(
cd idx-without-pack &&
# Avoid producing different packs due to delta/base choices
git config pack.window 0 &&
P1=$(commit_and_pack 1) &&
P2=$(commit_and_pack 2) &&
P3=$(commit_and_pack 3) &&
P4=$(commit_and_pack 4) &&
ls .git/objects/pack/*.pack >old-counts &&
test_line_count = 4 old-counts &&
# Remove one .pack file
rm .git/objects/pack/$P2 &&
ls .git/objects/pack/*.pack >before-pack-dir &&
test_must_fail git fsck &&
commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default In 7a5d604443 (commit: detect commits that exist in commit-graph but not in the ODB, 2023-10-31), we have introduced a new object existence check into `repo_parse_commit_internal()` so that we do not parse commits via the commit-graph that don't have a corresponding object in the object database. This new check of course comes with a performance penalty, which the commit put at around 30% for `git rev-list --topo-order`. But there are in fact scenarios where the performance regression is even higher. The following benchmark against linux.git with a fully-build commit-graph: Benchmark 1: git.v2.42.1 rev-list --count HEAD Time (mean ± σ): 658.0 ms ± 5.2 ms [User: 613.5 ms, System: 44.4 ms] Range (min … max): 650.2 ms … 666.0 ms 10 runs Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD Time (mean ± σ): 1.333 s ± 0.019 s [User: 1.263 s, System: 0.069 s] Range (min … max): 1.302 s … 1.361 s 10 runs Summary git.v2.42.1 rev-list --count HEAD ran 2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD While it's a noble goal to ensure that results are the same regardless of whether or not we have a potentially stale commit-graph, taking twice as much time is a tough sell. Furthermore, we can generally assume that the commit-graph will be updated by git-gc(1) or git-maintenance(1) as required so that the case where the commit-graph is stale should not at all be common. With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore the behaviour and thus performance previous to the mentioned commit. In order to not be inconsistent, also disable this behaviour by default in `lookup_commit_in_graph()`, where the object existence check has been introduced right at its inception via f559d6d45e (revision: avoid hitting packfiles when commits are in commit-graph, 2021-08-09). This results in another speedup in commands that end up calling this function, even though it's less pronounced compared to the above benchmark. The following has been executed in linux.git with ~1.2 million references: Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.947 s ± 0.003 s [User: 2.412 s, System: 0.534 s] Range (min … max): 2.943 s … 2.949 s 3 runs Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted Time (mean ± σ): 2.724 s ± 0.030 s [User: 2.207 s, System: 0.514 s] Range (min … max): 2.704 s … 2.759 s 3 runs Summary GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran 1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted So whereas 7a5d604443 initially introduced the logic to start doing an object existence check in `repo_parse_commit_internal()` by default, the updated logic will now instead cause `lookup_commit_in_graph()` to stop doing the check by default. This behaviour continues to be tweakable by the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable. Note that this requires us to amend some tests to manually turn on the paranoid checks again. This is because we cause repository corruption by manually deleting objects which are part of the commit graph already. These circumstances shouldn't usually happen in repositories. Reported-by: Jeff King <peff@peff.net> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-24 11:08:21 +00:00
test_must_fail env GIT_COMMIT_GRAPH_PARANOIA=true git repack --cruft -d 2>err &&
builtin/repack.c: only repack `.pack`s that exist In 73320e49add (builtin/repack.c: only collect fully-formed packs, 2023-06-07), we switched the check for which packs to collect by starting at the .idx files and looking for matching .pack files. This avoids trying to repack pack-files that have not had their pack-indexes installed yet. However, it does cause maintenance to halt if we find the (problematic, but not insurmountable) case of a .idx file without a corresponding .pack file. In an environment where packfile maintenance is a critical function, such a hard stop is costly and requires human intervention to resolve (by deleting the .idx file). This was not the case before. We successfully repacked through this scenario until the recent change to scan for .idx files. Further, if we are actually in a case where objects are missing, we detect this at a different point during the reachability walk. In other cases, Git prepares its list of packfiles by scanning .idx files and then only adds it to the packfile list if the corresponding .pack file exists. It even does so without a warning! (See add_packed_git() in packfile.c for details.) This case is much less likely to occur than the failures seen before 73320e49add. Packfiles are "installed" by writing the .pack file before the .idx and that process can be interrupted. Packfiles _should_ be deleted by deleting the .idx first, followed by the .pack file, but unlink_pack_path() does not do this: it deletes the .pack _first_, allowing a window where this process could be interrupted. We leave the consideration of changing this order as a separate concern. Knowing that this condition is possible from interrupted Git processes and not other tools lends some weight that Git should be more flexible around this scenario. Add a check to see if the .pack file exists before adding it to the list for repacking. This will stop a number of maintenance failures seen in production but fixed by deleting the .idx files. This brings us closer to the case before 73320e49add in that 'git repack' will not fail when there is an orphaned .idx file, at least, not due to the way we scan for packfiles. In the case that the .pack file was erroneously deleted without copies of its objects in other installed packfiles, then 'git repack' will fail due to the reachable object walk. This does resolve the case where automated repacks will no longer be halted on this case. The tests in t7700 show both these successful scenarios and the case of failing if the .pack was truly required. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-11 17:32:34 +00:00
grep "bad object" err &&
# Before failing, the repack did not modify the
# pack directory.
ls .git/objects/pack/*.pack >after-pack-dir &&
test_cmp before-pack-dir after-pack-dir
)
'
test_expect_success 'bitmaps are created by default in bare repos' '
git clone --bare .git bare.git &&
rm -f bare.git/objects/pack/*.bitmap &&
GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
git -C bare.git repack -ad &&
bitmap=$(ls bare.git/objects/pack/*.bitmap) &&
test_path_is_file "$bitmap"
'
test_expect_success 'incremental repack does not complain' '
git -C bare.git repack -q 2>repack.err &&
test_must_be_empty repack.err
'
test_expect_success 'bitmaps can be disabled on bare repos' '
GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
git -c repack.writeBitmaps=false -C bare.git repack -ad &&
bitmap=$(ls bare.git/objects/pack/*.bitmap || :) &&
test -z "$bitmap"
'
test_expect_success 'no bitmaps created if .keep files present' '
pack=$(ls bare.git/objects/pack/*.pack) &&
test_path_is_file "$pack" &&
keep=${pack%.pack}.keep &&
test_when_finished "rm -f \"\$keep\"" &&
>"$keep" &&
GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
git -C bare.git repack -ad 2>stderr &&
test_must_be_empty stderr &&
find bare.git/objects/pack/ -type f -name "*.bitmap" >actual &&
test_must_be_empty actual
'
test_expect_success 'auto-bitmaps do not complain if unavailable' '
test_config -C bare.git pack.packSizeLimit 1M &&
blob=$(test-tool genrandom big $((1024*1024)) |
git -C bare.git hash-object -w --stdin) &&
git -C bare.git update-ref refs/tags/big $blob &&
GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
git -C bare.git repack -ad 2>stderr &&
test_must_be_empty stderr &&
find bare.git/objects/pack -type f -name "*.bitmap" >actual &&
test_must_be_empty actual
'
repack: add `--filter=<filter-spec>` option This new option puts the objects specified by `<filter-spec>` into a separate packfile. This could be useful if, for example, some blobs take up a lot of precious space on fast storage while they are rarely accessed. It could make sense to move them into a separate cheaper, though slower, storage. It's possible to find which new packfile contains the filtered out objects using one of the following: - `git verify-pack -v ...`, - `test-tool find-pack ...`, which a previous commit added, - `--filter-to=<dir>`, which a following commit will add to specify where the pack containing the filtered out objects will be. This feature is implemented by running `git pack-objects` twice in a row. The first command is run with `--filter=<filter-spec>`, using the specified filter. It packs objects while omitting the objects specified by the filter. Then another `git pack-objects` command is launched using `--stdin-packs`. We pass it all the previously existing packs into its stdin, so that it will pack all the objects in the previously existing packs. But we also pass into its stdin, the pack created by the previous `git pack-objects --filter=<filter-spec>` command as well as the kept packs, all prefixed with '^', so that the objects in these packs will be omitted from the resulting pack. The result is that only the objects filtered out by the first `git pack-objects` command are in the pack resulting from the second `git pack-objects` command. As the interactions with kept packs are a bit tricky, a few related tests are added. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 16:55:01 +00:00
test_expect_success 'repacking with a filter works' '
git -C bare.git repack -a -d &&
test_stdout_line_count = 1 ls bare.git/objects/pack/*.pack &&
git -C bare.git -c repack.writebitmaps=false repack -a -d --filter=blob:none &&
test_stdout_line_count = 2 ls bare.git/objects/pack/*.pack &&
commit_pack=$(test-tool -C bare.git find-pack -c 1 HEAD) &&
blob_pack=$(test-tool -C bare.git find-pack -c 1 HEAD:file1) &&
test "$commit_pack" != "$blob_pack" &&
tree_pack=$(test-tool -C bare.git find-pack -c 1 HEAD^{tree}) &&
test "$tree_pack" = "$commit_pack" &&
blob_pack2=$(test-tool -C bare.git find-pack -c 1 HEAD:file2) &&
test "$blob_pack2" = "$blob_pack"
'
test_expect_success '--filter fails with --write-bitmap-index' '
test_must_fail \
env GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
git -C bare.git repack -a -d --write-bitmap-index --filter=blob:none
'
test_expect_success 'repacking with two filters works' '
git init two-filters &&
(
cd two-filters &&
mkdir subdir &&
test_commit foo &&
test_commit subdir_bar subdir/bar &&
test_commit subdir_baz subdir/baz
) &&
git clone --no-local --bare two-filters two-filters.git &&
(
cd two-filters.git &&
test_stdout_line_count = 1 ls objects/pack/*.pack &&
git -c repack.writebitmaps=false repack -a -d \
--filter=blob:none --filter=tree:1 &&
test_stdout_line_count = 2 ls objects/pack/*.pack &&
commit_pack=$(test-tool find-pack -c 1 HEAD) &&
blob_pack=$(test-tool find-pack -c 1 HEAD:foo.t) &&
root_tree_pack=$(test-tool find-pack -c 1 HEAD^{tree}) &&
subdir_tree_hash=$(git ls-tree --object-only HEAD -- subdir) &&
subdir_tree_pack=$(test-tool find-pack -c 1 "$subdir_tree_hash") &&
# Root tree and subdir tree are not in the same packfiles
test "$commit_pack" != "$blob_pack" &&
test "$commit_pack" = "$root_tree_pack" &&
test "$blob_pack" = "$subdir_tree_pack"
)
'
prepare_for_keep_packs () {
git init keep-packs &&
(
cd keep-packs &&
test_commit foo &&
test_commit bar
) &&
git clone --no-local --bare keep-packs keep-packs.git &&
(
cd keep-packs.git &&
# Create two packs
# The first pack will contain all of the objects except one blob
git rev-list --objects --all >objs &&
grep -v "bar.t" objs | git pack-objects pack &&
# The second pack will contain the excluded object and be kept
packid=$(grep "bar.t" objs | git pack-objects pack) &&
>pack-$packid.keep &&
# Replace the existing pack with the 2 new ones
rm -f objects/pack/pack* &&
mv pack-* objects/pack/
)
}
test_expect_success '--filter works with .keep packs' '
prepare_for_keep_packs &&
(
cd keep-packs.git &&
foo_pack=$(test-tool find-pack -c 1 HEAD:foo.t) &&
bar_pack=$(test-tool find-pack -c 1 HEAD:bar.t) &&
head_pack=$(test-tool find-pack -c 1 HEAD) &&
test "$foo_pack" != "$bar_pack" &&
test "$foo_pack" = "$head_pack" &&
git -c repack.writebitmaps=false repack -a -d --filter=blob:none &&
foo_pack_1=$(test-tool find-pack -c 1 HEAD:foo.t) &&
bar_pack_1=$(test-tool find-pack -c 1 HEAD:bar.t) &&
head_pack_1=$(test-tool find-pack -c 1 HEAD) &&
# Object bar is still only in the old .keep pack
test "$foo_pack_1" != "$foo_pack" &&
test "$bar_pack_1" = "$bar_pack" &&
test "$head_pack_1" != "$head_pack" &&
test "$foo_pack_1" != "$bar_pack_1" &&
test "$foo_pack_1" != "$head_pack_1" &&
test "$bar_pack_1" != "$head_pack_1"
)
'
test_expect_success '--filter works with --pack-kept-objects and .keep packs' '
rm -rf keep-packs keep-packs.git &&
prepare_for_keep_packs &&
(
cd keep-packs.git &&
foo_pack=$(test-tool find-pack -c 1 HEAD:foo.t) &&
bar_pack=$(test-tool find-pack -c 1 HEAD:bar.t) &&
head_pack=$(test-tool find-pack -c 1 HEAD) &&
test "$foo_pack" != "$bar_pack" &&
test "$foo_pack" = "$head_pack" &&
git -c repack.writebitmaps=false repack -a -d --filter=blob:none \
--pack-kept-objects &&
foo_pack_1=$(test-tool find-pack -c 1 HEAD:foo.t) &&
test-tool find-pack -c 2 HEAD:bar.t >bar_pack_1 &&
head_pack_1=$(test-tool find-pack -c 1 HEAD) &&
test "$foo_pack_1" != "$foo_pack" &&
test "$foo_pack_1" != "$bar_pack" &&
test "$head_pack_1" != "$head_pack" &&
# Object bar is in both the old .keep pack and the new
# pack that contained the filtered out objects
grep "$bar_pack" bar_pack_1 &&
grep "$foo_pack_1" bar_pack_1 &&
test "$foo_pack_1" != "$head_pack_1"
)
'
repack: implement `--filter-to` for storing filtered out objects A previous commit has implemented `git repack --filter=<filter-spec>` to allow users to filter out some objects from the main pack and move them into a new different pack. It would be nice if this new different pack could be created in a different directory than the regular pack. This would make it possible to move large blobs into a pack on a different kind of storage, for example cheaper storage. Even in a different directory, this pack can be accessible if, for example, the Git alternates mechanism is used to point to it. In fact not using the Git alternates mechanism can corrupt a repo as the generated pack containing the filtered objects might not be accessible from the repo any more. So setting up the Git alternates mechanism should be done before using this feature if the user wants the repo to be fully usable while this feature is used. In some cases, like when a repo has just been cloned or when there is no other activity in the repo, it's Ok to setup the Git alternates mechanism afterwards though. It's also Ok to just inspect the generated packfile containing the filtered objects and then just move it into the '.git/objects/pack/' directory manually. That's why it's not necessary for this command to check that the Git alternates mechanism has been already setup. While at it, as an example to show that `--filter` and `--filter-to` work well with other options, let's also add a test to check that these options work well with `--max-pack-size`. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 16:55:03 +00:00
test_expect_success '--filter-to stores filtered out objects' '
git -C bare.git repack -a -d &&
test_stdout_line_count = 1 ls bare.git/objects/pack/*.pack &&
git init --bare filtered.git &&
git -C bare.git -c repack.writebitmaps=false repack -a -d \
--filter=blob:none \
--filter-to=../filtered.git/objects/pack/pack &&
test_stdout_line_count = 1 ls bare.git/objects/pack/pack-*.pack &&
test_stdout_line_count = 1 ls filtered.git/objects/pack/pack-*.pack &&
commit_pack=$(test-tool -C bare.git find-pack -c 1 HEAD) &&
blob_pack=$(test-tool -C bare.git find-pack -c 0 HEAD:file1) &&
blob_hash=$(git -C bare.git rev-parse HEAD:file1) &&
test -n "$blob_hash" &&
blob_pack=$(test-tool -C filtered.git find-pack -c 1 $blob_hash) &&
echo $(pwd)/filtered.git/objects >bare.git/objects/info/alternates &&
blob_pack=$(test-tool -C bare.git find-pack -c 1 HEAD:file1) &&
blob_content=$(git -C bare.git show $blob_hash) &&
test "$blob_content" = "content1"
'
test_expect_success '--filter works with --max-pack-size' '
rm -rf filtered.git &&
git init --bare filtered.git &&
git init max-pack-size &&
(
cd max-pack-size &&
test_commit base &&
# two blobs which exceed the maximum pack size
test-tool genrandom foo 1048576 >foo &&
git hash-object -w foo &&
test-tool genrandom bar 1048576 >bar &&
git hash-object -w bar &&
git add foo bar &&
git commit -m "adding foo and bar"
) &&
git clone --no-local --bare max-pack-size max-pack-size.git &&
(
cd max-pack-size.git &&
git -c repack.writebitmaps=false repack -a -d --filter=blob:none \
--max-pack-size=1M \
--filter-to=../filtered.git/objects/pack/pack &&
echo $(cd .. && pwd)/filtered.git/objects >objects/info/alternates &&
# Check that the 3 blobs are in different packfiles in filtered.git
test_stdout_line_count = 3 ls ../filtered.git/objects/pack/pack-*.pack &&
test_stdout_line_count = 1 ls objects/pack/pack-*.pack &&
foo_pack=$(test-tool find-pack -c 1 HEAD:foo) &&
bar_pack=$(test-tool find-pack -c 1 HEAD:bar) &&
base_pack=$(test-tool find-pack -c 1 HEAD:base.t) &&
test "$foo_pack" != "$bar_pack" &&
test "$foo_pack" != "$base_pack" &&
test "$bar_pack" != "$base_pack" &&
for pack in "$foo_pack" "$bar_pack" "$base_pack"
do
case "$foo_pack" in */filtered.git/objects/pack/*) true ;; *) return 1 ;; esac
done
)
'
builtin/repack.c: support writing a MIDX while repacking Teach `git repack` a new `--write-midx` option for callers that wish to persist a multi-pack index in their repository while repacking. There are two existing alternatives to this new flag, but they don't cover our particular use-case. These alternatives are: - Call 'git multi-pack-index write' after running 'git repack', or - Set 'GIT_TEST_MULTI_PACK_INDEX=1' in your environment when running 'git repack'. The former works, but introduces a gap in bitmap coverage between repacking and writing a new MIDX (since the repack may have deleted a pack included in the existing MIDX, invalidating it altogether). Setting the 'GIT_TEST_' environment variable is obviously unsupported. In fact, even if it were supported officially, it still wouldn't work, because it generates the MIDX *after* redundant packs have been dropped, leading to the same issue as above. Introduce a new option which eliminates this race by teaching `git repack` to generate the MIDX at the critical point: after the new packs have been written and moved into place, but before the redundant packs have been removed. This option is compatible with `git repack`'s '--bitmap' option (it changes the interpretation to be: "write a bitmap corresponding to the MIDX after one has been generated"). There is a little bit of additional noise in the patch below to avoid repeating ourselves when selecting which packs to delete. Instead of a single loop as before (where we iterate over 'existing_packs', decide if a pack is worth deleting, and if so, delete it), we have two loops (the first where we decide which ones are worth deleting, and the second where we actually do the deleting). This makes it so we have a single check we can make consistently when (1) telling the MIDX which packs we want to exclude, and (2) actually unlinking the redundant packs. There is also a tiny change to short-circuit the body of write_midx_included_packs() when no packs remain in the case of an empty repository. The MIDX code does not handle this, so avoid trying to generate a MIDX covering zero packs in the first place. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-29 01:55:18 +00:00
objdir=.git/objects
midx=$objdir/pack/multi-pack-index
test_expect_success 'setup for --write-midx tests' '
git init midx &&
(
cd midx &&
git config core.multiPackIndex true &&
test_commit base
)
'
test_expect_success '--write-midx unchanged' '
(
cd midx &&
GIT_TEST_MULTI_PACK_INDEX=0 git repack &&
test_path_is_missing $midx &&
test_path_is_missing $midx-*.bitmap &&
GIT_TEST_MULTI_PACK_INDEX=0 git repack --write-midx &&
test_path_is_file $midx &&
test_path_is_missing $midx-*.bitmap &&
test_midx_consistent $objdir
)
'
test_expect_success '--write-midx with a new pack' '
(
cd midx &&
test_commit loose &&
GIT_TEST_MULTI_PACK_INDEX=0 git repack --write-midx &&
test_path_is_file $midx &&
test_path_is_missing $midx-*.bitmap &&
test_midx_consistent $objdir
)
'
test_expect_success '--write-midx with -b' '
(
cd midx &&
GIT_TEST_MULTI_PACK_INDEX=0 git repack -mb &&
test_path_is_file $midx &&
test_path_is_file $midx-*.bitmap &&
test_midx_consistent $objdir
)
'
test_expect_success '--write-midx with -d' '
(
cd midx &&
test_commit repack &&
GIT_TEST_MULTI_PACK_INDEX=0 git repack -Ad --write-midx &&
test_path_is_file $midx &&
test_path_is_missing $midx-*.bitmap &&
test_midx_consistent $objdir
)
'
test_expect_success 'cleans up MIDX when appropriate' '
(
cd midx &&
test_commit repack-2 &&
GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb --write-midx &&
checksum=$(midx_checksum $objdir) &&
test_path_is_file $midx &&
test_path_is_file $midx-$checksum.bitmap &&
test_commit repack-3 &&
GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb --write-midx &&
test_path_is_file $midx &&
test_path_is_missing $midx-$checksum.bitmap &&
test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
test_commit repack-4 &&
GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb &&
find $objdir/pack -type f -name "multi-pack-index*" >files &&
test_must_be_empty files
)
'
test_expect_success '--write-midx with preferred bitmap tips' '
git init midx-preferred-tips &&
test_when_finished "rm -fr midx-preferred-tips" &&
(
cd midx-preferred-tips &&
test_commit_bulk --message="%s" 103 &&
git log --format="%H" >commits.raw &&
sort <commits.raw >commits &&
git log --format="create refs/tags/%s/%s %H" HEAD >refs &&
git update-ref --stdin <refs &&
t/t7700-repack.sh: fix test breakages with `GIT_TEST_MULTI_PACK_INDEX=1 ` There are a handful of related test breakages which are found when running t/t7700-repack.sh with GIT_TEST_MULTI_PACK_INDEX set to "1" in your environment. Both test failures are the result of something like: git repack --write-midx --write-bitmap-index [...] && test_path_is_file $midx && test_path_is_file $midx-$(midx_checksum $objdir).bitmap , where we repack instructing Git to write a new MIDX and corresponding MIDX bitamp. The error occurs when GIT_TEST_MULTI_PACK_INDEX=1 is found in the enviornment. This causes Git to write out a second MIDX (after processing the builtin's `--write-midx` argument) which is identical to the first, but does not request a bitmap (since we did not set the GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP variable in the environment). Since c528e179662 (pack-bitmap: write multi-pack bitmaps, 2021-08-31), the MIDX machinery will drop an existing MIDX bitmap when rewriting an identical MIDX which does not itself request a corresponding bitmap, which is similar to the way repack itself behaves in the pack-bitmap case. Correct these issues (which date back to [1] and [2], respectively) by explicitly setting GIT_TEST_MULTI_PACK_INDEX to zero before running each command. In the future, we should consider removing GIT_TEST_MULTI_PACK_INDEX, and in general clean up unused GIT_TEST_-variables. But that is a larger effort, and this ensures that we can cleanly run: $ GIT_TEST_MULTI_PACK_INDEX=1 make test in the meantime. [1]: 324efc90d1b (builtin/repack.c: pass `--refs-snapshot` when writing bitmaps, 2021-10-01) [2]: 197443e80ab (repack: don't remove .keep packs with `--pack-kept-objects`, 2022-10-17). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-02 16:26:34 +00:00
GIT_TEST_MULTI_PACK_INDEX=0 \
git repack --write-midx --write-bitmap-index &&
test_path_is_file $midx &&
test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
test-tool bitmap list-commits | sort >bitmaps &&
comm -13 bitmaps commits >before &&
test_line_count = 1 before &&
rm -fr $midx-$(midx_checksum $objdir).bitmap &&
rm -fr $midx &&
# instead of constructing the snapshot ourselves (c.f., the test
# "write a bitmap with --refs-snapshot (preferred tips)" in
# t5326), mark the missing commit as preferred by adding it to
# the pack.preferBitmapTips configuration.
git for-each-ref --format="%(refname:rstrip=1)" \
--points-at="$(cat before)" >missing &&
git config pack.preferBitmapTips "$(cat missing)" &&
git repack --write-midx --write-bitmap-index &&
test-tool bitmap list-commits | sort >bitmaps &&
comm -13 bitmaps commits >after &&
! test_cmp before after
)
'
t7700: check post-condition in kept-pack test The '--write-midx -b packs non-kept objects' test in t7700-repack.sh uses test_subcommand_inexact to check that 'git repack' properly adds the '--honor-pack-keep' flag to the 'git pack-objects' subcommand. However, the test_subcommand_inexact helper is more flexible than initially designed, and this instance is the only one that makes use of it: there are additional arguments between 'git pack-objects' and the '--honor-pack-keep' flag. In order to make test_subcommand_inexact more strict, we need to fix this instance. This test checks that 'git repack --write-midx -a -b -d' will create a new pack-file that does not contain the objects within the kept pack. This behavior is possible because of the multi-pack-index bitmap that will bitmap objects against multiple packs. Without --write-midx, the objects in the kept pack would be duplicated so the resulting pack is closed under reachability and bitmaps can be created against it. This is discussed in more detail in e4d0c11c0 (repack: respect kept objects with '--write-midx -b', 2021-12-20) which also introduced this instance of test_subcommand_inexact. To better verify the intended post-conditions while also removing this instance of test_subcommand_inexact, rewrite the test to check the list of packed objects in the kept pack and the list of the objects in the newly-repacked pack-file _other_ than the kept pack. These lists should be disjoint. Be sure to include a non-kept pack-file and loose objects to be extra careful that this is properly behaving with kept packs and not just avoiding repacking all pack-files. Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-25 19:02:46 +00:00
# The first argument is expected to be a filename
# and that file should contain the name of a .idx
# file. Send the list of objects in that .idx file
# into stdout.
get_sorted_objects_from_pack () {
git show-index <$(cat "$1") >raw &&
cut -d" " -f2 raw
}
test_expect_success '--write-midx -b packs non-kept objects' '
t7700: check post-condition in kept-pack test The '--write-midx -b packs non-kept objects' test in t7700-repack.sh uses test_subcommand_inexact to check that 'git repack' properly adds the '--honor-pack-keep' flag to the 'git pack-objects' subcommand. However, the test_subcommand_inexact helper is more flexible than initially designed, and this instance is the only one that makes use of it: there are additional arguments between 'git pack-objects' and the '--honor-pack-keep' flag. In order to make test_subcommand_inexact more strict, we need to fix this instance. This test checks that 'git repack --write-midx -a -b -d' will create a new pack-file that does not contain the objects within the kept pack. This behavior is possible because of the multi-pack-index bitmap that will bitmap objects against multiple packs. Without --write-midx, the objects in the kept pack would be duplicated so the resulting pack is closed under reachability and bitmaps can be created against it. This is discussed in more detail in e4d0c11c0 (repack: respect kept objects with '--write-midx -b', 2021-12-20) which also introduced this instance of test_subcommand_inexact. To better verify the intended post-conditions while also removing this instance of test_subcommand_inexact, rewrite the test to check the list of packed objects in the kept pack and the list of the objects in the newly-repacked pack-file _other_ than the kept pack. These lists should be disjoint. Be sure to include a non-kept pack-file and loose objects to be extra careful that this is properly behaving with kept packs and not just avoiding repacking all pack-files. Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-25 19:02:46 +00:00
git init repo &&
test_when_finished "rm -fr repo" &&
(
cd repo &&
# Create a kept pack-file
test_commit base &&
git repack -ad &&
find $objdir/pack -name "*.idx" >before &&
test_line_count = 1 before &&
before_name=$(cat before) &&
>${before_name%.idx}.keep &&
# Create a non-kept pack-file
test_commit other &&
git repack &&
# Create loose objects
test_commit loose &&
# Repack everything
git repack --write-midx -a -b -d &&
# There should be two pack-files now, the
# old, kept pack and the new, non-kept pack.
find $objdir/pack -name "*.idx" | sort >after &&
test_line_count = 2 after &&
find $objdir/pack -name "*.keep" >kept &&
kept_name=$(cat kept) &&
echo ${kept_name%.keep}.idx >kept-idx &&
test_cmp before kept-idx &&
# Get object list from the kept pack.
get_sorted_objects_from_pack before >old.objects &&
# Get object list from the one non-kept pack-file
comm -13 before after >new-pack &&
test_line_count = 1 new-pack &&
get_sorted_objects_from_pack new-pack >new.objects &&
# None of the objects in the new pack should
# exist within the kept pack.
comm -12 old.objects new.objects >shared.objects &&
test_must_be_empty shared.objects
)
'
test_expect_success '--write-midx removes stale pack-based bitmaps' '
rm -fr repo &&
git init repo &&
test_when_finished "rm -fr repo" &&
(
cd repo &&
test_commit base &&
GIT_TEST_MULTI_PACK_INDEX=0 git repack -Ab &&
pack_bitmap=$(ls $objdir/pack/pack-*.bitmap) &&
test_path_is_file "$pack_bitmap" &&
test_commit tip &&
GIT_TEST_MULTI_PACK_INDEX=0 git repack -bm &&
test_path_is_file $midx &&
test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
test_path_is_missing $pack_bitmap
)
'
repack: don't remove .keep packs with `--pack-kept-objects` `git repack` supports a `--pack-kept-objects` flag which more or less translates to whether or not we pass `--honor-pack-keep` down to `git pack-objects` when assembling a new pack. This behavior has existed since ee34a2bead (repack: add `repack.packKeptObjects` config var, 2014-03-03). In that commit, the documentation was extended to say: [...] Note that we still do not delete `.keep` packs after `pack-objects` finishes. Unfortunately, this is not the case when `--pack-kept-objects` is combined with a `--geometric` repack. When doing a geometric repack, we include `.keep` packs when enumerating available packs only when `pack_kept_objects` is set. So this all works fine when `--no-pack-kept-objects` (or similar) is given. Kept packs are excluded from the geometric roll-up, so when we go to delete redundant packs (with `-d`), no `.keep` packs appear "below the split" in our geometric progression. But when `--pack-kept-objects` is given, things can go awry. Namely, when a kept pack is included in the list of packs tracked by the `pack_geometry` struct *and* part of the pack roll-up, we will delete the `.keep` pack when we shouldn't. Note that this *doesn't* result in object corruption, since the `.keep` pack's objects are still present in the new pack. But the `.keep` pack itself is removed, which violates our promise from back in ee34a2bead. But there's more. Because `repack` computes the geometric roll-up independently from selecting which packs belong in a MIDX (with `--write-midx`), this can lead to odd behavior. Consider when a `.keep` pack appears below the geometric split (ie., its objects will be part of the new pack we generate). We'll write a MIDX containing the new pack along with the existing `.keep` pack. But because the `.keep` pack appears below the geometric split line, we'll (incorrectly) try to remove it. While this doesn't corrupt the repository, it does cause us to remove the MIDX we just wrote, since removing that pack would invalidate the new MIDX. Funny enough, this behavior became far less noticeable after e4d0c11c04 (repack: respect kept objects with '--write-midx -b', 2021-12-20), which made `pack_kept_objects` be enabled by default only when we were writing a non-MIDX bitmap. But e4d0c11c04 didn't resolve this bug, it just made it harder to notice unless callers explicitly passed `--pack-kept-objects`. The solution is to avoid trying to remove `.keep` packs during `--geometric` repacks, even when they appear below the geometric split line, which is the approach this patch implements. Co-authored-by: Victoria Dye <vdye@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-18 02:26:06 +00:00
test_expect_success '--write-midx with --pack-kept-objects' '
git init repo &&
test_when_finished "rm -fr repo" &&
(
cd repo &&
test_commit one &&
test_commit two &&
one="$(echo "one" | git pack-objects --revs $objdir/pack/pack)" &&
two="$(echo "one..two" | git pack-objects --revs $objdir/pack/pack)" &&
keep="$objdir/pack/pack-$one.keep" &&
touch "$keep" &&
t/t7700-repack.sh: fix test breakages with `GIT_TEST_MULTI_PACK_INDEX=1 ` There are a handful of related test breakages which are found when running t/t7700-repack.sh with GIT_TEST_MULTI_PACK_INDEX set to "1" in your environment. Both test failures are the result of something like: git repack --write-midx --write-bitmap-index [...] && test_path_is_file $midx && test_path_is_file $midx-$(midx_checksum $objdir).bitmap , where we repack instructing Git to write a new MIDX and corresponding MIDX bitamp. The error occurs when GIT_TEST_MULTI_PACK_INDEX=1 is found in the enviornment. This causes Git to write out a second MIDX (after processing the builtin's `--write-midx` argument) which is identical to the first, but does not request a bitmap (since we did not set the GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP variable in the environment). Since c528e179662 (pack-bitmap: write multi-pack bitmaps, 2021-08-31), the MIDX machinery will drop an existing MIDX bitmap when rewriting an identical MIDX which does not itself request a corresponding bitmap, which is similar to the way repack itself behaves in the pack-bitmap case. Correct these issues (which date back to [1] and [2], respectively) by explicitly setting GIT_TEST_MULTI_PACK_INDEX to zero before running each command. In the future, we should consider removing GIT_TEST_MULTI_PACK_INDEX, and in general clean up unused GIT_TEST_-variables. But that is a larger effort, and this ensures that we can cleanly run: $ GIT_TEST_MULTI_PACK_INDEX=1 make test in the meantime. [1]: 324efc90d1b (builtin/repack.c: pass `--refs-snapshot` when writing bitmaps, 2021-10-01) [2]: 197443e80ab (repack: don't remove .keep packs with `--pack-kept-objects`, 2022-10-17). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-02 16:26:34 +00:00
GIT_TEST_MULTI_PACK_INDEX=0 \
repack: don't remove .keep packs with `--pack-kept-objects` `git repack` supports a `--pack-kept-objects` flag which more or less translates to whether or not we pass `--honor-pack-keep` down to `git pack-objects` when assembling a new pack. This behavior has existed since ee34a2bead (repack: add `repack.packKeptObjects` config var, 2014-03-03). In that commit, the documentation was extended to say: [...] Note that we still do not delete `.keep` packs after `pack-objects` finishes. Unfortunately, this is not the case when `--pack-kept-objects` is combined with a `--geometric` repack. When doing a geometric repack, we include `.keep` packs when enumerating available packs only when `pack_kept_objects` is set. So this all works fine when `--no-pack-kept-objects` (or similar) is given. Kept packs are excluded from the geometric roll-up, so when we go to delete redundant packs (with `-d`), no `.keep` packs appear "below the split" in our geometric progression. But when `--pack-kept-objects` is given, things can go awry. Namely, when a kept pack is included in the list of packs tracked by the `pack_geometry` struct *and* part of the pack roll-up, we will delete the `.keep` pack when we shouldn't. Note that this *doesn't* result in object corruption, since the `.keep` pack's objects are still present in the new pack. But the `.keep` pack itself is removed, which violates our promise from back in ee34a2bead. But there's more. Because `repack` computes the geometric roll-up independently from selecting which packs belong in a MIDX (with `--write-midx`), this can lead to odd behavior. Consider when a `.keep` pack appears below the geometric split (ie., its objects will be part of the new pack we generate). We'll write a MIDX containing the new pack along with the existing `.keep` pack. But because the `.keep` pack appears below the geometric split line, we'll (incorrectly) try to remove it. While this doesn't corrupt the repository, it does cause us to remove the MIDX we just wrote, since removing that pack would invalidate the new MIDX. Funny enough, this behavior became far less noticeable after e4d0c11c04 (repack: respect kept objects with '--write-midx -b', 2021-12-20), which made `pack_kept_objects` be enabled by default only when we were writing a non-MIDX bitmap. But e4d0c11c04 didn't resolve this bug, it just made it harder to notice unless callers explicitly passed `--pack-kept-objects`. The solution is to avoid trying to remove `.keep` packs during `--geometric` repacks, even when they appear below the geometric split line, which is the approach this patch implements. Co-authored-by: Victoria Dye <vdye@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-18 02:26:06 +00:00
git repack --write-midx --write-bitmap-index --geometric=2 -d \
--pack-kept-objects &&
test_path_is_file $keep &&
test_path_is_file $midx &&
test_path_is_file $midx-$(midx_checksum $objdir).bitmap
)
'
test_expect_success TTY '--quiet disables progress' '
test_terminal env GIT_PROGRESS_DELAY=0 \
git -C midx repack -ad --quiet --write-midx 2>stderr &&
test_must_be_empty stderr
'
repack: use tempfiles for signal cleanup When git-repack exits due to a signal, it tries to clean up by calling its remove_temporary_files() function, which walks through the packs dir looking for ".tmp-$$-pack-*" files to delete (where "$$" is the pid of the current process). The biggest problem here is that remove_temporary_files() is not safe to call in a signal handler. It uses opendir(), which isn't on the POSIX async-signal-safe list. The details will be platform-specific, but a likely issue is that it needs to allocate memory; if we receive a signal while inside malloc(), etc, we'll conflict on the allocator lock and deadlock with ourselves. We can fix this by just cleaning up the files directly, without walking the directory. We already know the complete list of .tmp-* files that were generated, because we recorded them via populate_pack_exts(). When we find files there, we can use register_tempfile() to record the filenames. If we receive a signal, then the tempfile API will clean them up for us, and it's async-safe and pretty battle-tested. Note that this is slightly racier than the existing scheme. We don't record the filenames until pack-objects tells us the hash over stdout. So during the period between it generating the file and reporting the hash, we'd fail to clean up. However, that period is very small. During most of the pack generation process pack-objects is using its own internal tempfiles. It's only at the very end that it moves them into the names git-repack expects, and then it immediately reports the name to us. Given that cleanup like this is best effort (after all, we may get SIGKILL), this level of race is acceptable. When we register the tempfiles, we'll record them locally and use the result to call rename_tempfile(), rather than renaming by hand. This isn't strictly necessary, as once we've renamed the files they're gone, and the tempfile API's cleanup unlink() would simply become a pointless noop. But managing the lifetimes of the tempfile objects is the cleanest thing to do, and the tempfile pointers naturally fill the same role as the old booleans. This patch also fixes another small problem. We only hook signals, and don't set up an atexit handler. So if we see an error that causes us to die(), we'll leave the .tmp-* files in place. But since the tempfile API handles this for us, this is now fixed for free. The new test covers this by stimulating a failure of pack-objects when generating a cruft pack. Before this patch, the .tmp-* file for the main pack would have been left, but now we correctly clean it up. Two small subtleties on the implementation: - in the renaming loop, we can stop re-constructing fname_old; we only use it when we have a tempfile to rename, so we can just ask the tempfile for its path (which, barring bugs, should be identical) - when renaming fails, our error message mentions fname_old. But since a failed rename_tempfile() invalidates the tempfile struct, we'll lose access to that string. Instead, let's mention the destination filename, which is what most other callers do. Reported-by: Jan Pokorný <poki@fnusa.cz> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-22 00:21:54 +00:00
test_expect_success 'clean up .tmp-* packs on error' '
test_must_fail ok=sigpipe git \
repack: use tempfiles for signal cleanup When git-repack exits due to a signal, it tries to clean up by calling its remove_temporary_files() function, which walks through the packs dir looking for ".tmp-$$-pack-*" files to delete (where "$$" is the pid of the current process). The biggest problem here is that remove_temporary_files() is not safe to call in a signal handler. It uses opendir(), which isn't on the POSIX async-signal-safe list. The details will be platform-specific, but a likely issue is that it needs to allocate memory; if we receive a signal while inside malloc(), etc, we'll conflict on the allocator lock and deadlock with ourselves. We can fix this by just cleaning up the files directly, without walking the directory. We already know the complete list of .tmp-* files that were generated, because we recorded them via populate_pack_exts(). When we find files there, we can use register_tempfile() to record the filenames. If we receive a signal, then the tempfile API will clean them up for us, and it's async-safe and pretty battle-tested. Note that this is slightly racier than the existing scheme. We don't record the filenames until pack-objects tells us the hash over stdout. So during the period between it generating the file and reporting the hash, we'd fail to clean up. However, that period is very small. During most of the pack generation process pack-objects is using its own internal tempfiles. It's only at the very end that it moves them into the names git-repack expects, and then it immediately reports the name to us. Given that cleanup like this is best effort (after all, we may get SIGKILL), this level of race is acceptable. When we register the tempfiles, we'll record them locally and use the result to call rename_tempfile(), rather than renaming by hand. This isn't strictly necessary, as once we've renamed the files they're gone, and the tempfile API's cleanup unlink() would simply become a pointless noop. But managing the lifetimes of the tempfile objects is the cleanest thing to do, and the tempfile pointers naturally fill the same role as the old booleans. This patch also fixes another small problem. We only hook signals, and don't set up an atexit handler. So if we see an error that causes us to die(), we'll leave the .tmp-* files in place. But since the tempfile API handles this for us, this is now fixed for free. The new test covers this by stimulating a failure of pack-objects when generating a cruft pack. Before this patch, the .tmp-* file for the main pack would have been left, but now we correctly clean it up. Two small subtleties on the implementation: - in the renaming loop, we can stop re-constructing fname_old; we only use it when we have a tempfile to rename, so we can just ask the tempfile for its path (which, barring bugs, should be identical) - when renaming fails, our error message mentions fname_old. But since a failed rename_tempfile() invalidates the tempfile struct, we'll lose access to that string. Instead, let's mention the destination filename, which is what most other callers do. Reported-by: Jan Pokorný <poki@fnusa.cz> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-22 00:21:54 +00:00
-c repack.cruftwindow=bogus \
repack -ad --cruft &&
find $objdir/pack -name '.tmp-*' >tmpfiles &&
test_must_be_empty tmpfiles
'
repack: drop remove_temporary_files() After we've successfully finished the repack, we call remove_temporary_files(), which looks for and removes any files matching ".tmp-$$-pack-*", where $$ is the pid of the current process. But this is pointless. If we make it this far in the process, we've already renamed these tempfiles into place, and there is nothing left to delete. Nor is there a point in trying to call it to clean up when we _aren't_ successful. It's not safe for using in a signal handler, and the previous commit already handed that job over to the tempfile API. It might seem like it would be useful to clean up stray .tmp files left by other invocations of git-repack. But it won't clean those files; it only matches ones with its pid, and leaves the rest. Fortunately, those are cleaned up naturally by successive calls to git-repack; we'll consider .tmp-*.pack the same as normal packfiles, so "repack -ad", etc, will roll up their contents and eventually delete them. The one case that could matter is if pack-objects generates an extension we don't know about, like ".tmp-pack-$$-$hash.some-new-ext". The current code will quietly delete such a file, while after this patch we'd leave it in place. In practice this doesn't happen, and would be indicative of a bug. Leaving the file as cruft is arguably a better behavior, as it means somebody is more likely to eventually notice and fix the bug. If we really wanted to be paranoid, we could scan for and warn about such files, but that seems like overkill. There's nothing to test with regard to the removal of this function. It was doing nothing, so the behavior should be the same. However, we can verify (and protect) our assumption that "repack -ad" will eventually remove stray files by adding a test for that. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-22 00:21:58 +00:00
test_expect_success 'repack -ad cleans up old .tmp-* packs' '
git rev-parse HEAD >input &&
git pack-objects $objdir/pack/.tmp-1234 <input &&
git repack -ad &&
find $objdir/pack -name '.tmp-*' >tmpfiles &&
test_must_be_empty tmpfiles
'
test_expect_success 'setup for update-server-info' '
git init update-server-info &&
test_commit -C update-server-info message
'
test_server_info_present () {
test_path_is_file update-server-info/.git/objects/info/packs &&
test_path_is_file update-server-info/.git/info/refs
}
test_server_info_missing () {
test_path_is_missing update-server-info/.git/objects/info/packs &&
test_path_is_missing update-server-info/.git/info/refs
}
test_server_info_cleanup () {
rm -f update-server-info/.git/objects/info/packs update-server-info/.git/info/refs &&
test_server_info_missing
}
test_expect_success 'updates server info by default' '
test_server_info_cleanup &&
git -C update-server-info repack &&
test_server_info_present
'
test_expect_success '-n skips updating server info' '
test_server_info_cleanup &&
git -C update-server-info repack -n &&
test_server_info_missing
'
test_expect_success 'repack.updateServerInfo=true updates server info' '
test_server_info_cleanup &&
git -C update-server-info -c repack.updateServerInfo=true repack &&
test_server_info_present
'
test_expect_success 'repack.updateServerInfo=false skips updating server info' '
test_server_info_cleanup &&
git -C update-server-info -c repack.updateServerInfo=false repack &&
test_server_info_missing
'
test_expect_success '-n overrides repack.updateServerInfo=true' '
test_server_info_cleanup &&
git -C update-server-info -c repack.updateServerInfo=true repack -n &&
test_server_info_missing
'
test_done