2008-11-12 17:59:02 +00:00
|
|
|
#!/bin/sh
|
|
|
|
|
|
|
|
test_description='git repack works correctly'
|
|
|
|
|
|
|
|
. ./test-lib.sh
|
builtin/repack.c: support writing a MIDX while repacking
Teach `git repack` a new `--write-midx` option for callers that wish to
persist a multi-pack index in their repository while repacking.
There are two existing alternatives to this new flag, but they don't
cover our particular use-case. These alternatives are:
- Call 'git multi-pack-index write' after running 'git repack', or
- Set 'GIT_TEST_MULTI_PACK_INDEX=1' in your environment when running
'git repack'.
The former works, but introduces a gap in bitmap coverage between
repacking and writing a new MIDX (since the repack may have deleted a
pack included in the existing MIDX, invalidating it altogether).
Setting the 'GIT_TEST_' environment variable is obviously unsupported.
In fact, even if it were supported officially, it still wouldn't work,
because it generates the MIDX *after* redundant packs have been dropped,
leading to the same issue as above.
Introduce a new option which eliminates this race by teaching `git
repack` to generate the MIDX at the critical point: after the new packs
have been written and moved into place, but before the redundant packs
have been removed.
This option is compatible with `git repack`'s '--bitmap' option (it
changes the interpretation to be: "write a bitmap corresponding to the
MIDX after one has been generated").
There is a little bit of additional noise in the patch below to avoid
repeating ourselves when selecting which packs to delete. Instead of a
single loop as before (where we iterate over 'existing_packs', decide if
a pack is worth deleting, and if so, delete it), we have two loops (the
first where we decide which ones are worth deleting, and the second
where we actually do the deleting). This makes it so we have a single
check we can make consistently when (1) telling the MIDX which packs we
want to exclude, and (2) actually unlinking the redundant packs.
There is also a tiny change to short-circuit the body of
write_midx_included_packs() when no packs remain in the case of an empty
repository. The MIDX code does not handle this, so avoid trying to
generate a MIDX covering zero packs in the first place.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-29 01:55:18 +00:00
|
|
|
. "${TEST_DIRECTORY}/lib-bitmap.sh"
|
|
|
|
. "${TEST_DIRECTORY}/lib-midx.sh"
|
2021-12-20 14:48:11 +00:00
|
|
|
. "${TEST_DIRECTORY}/lib-terminal.sh"
|
2008-11-12 17:59:02 +00:00
|
|
|
|
2019-12-04 22:03:09 +00:00
|
|
|
commit_and_pack () {
|
2019-11-27 19:53:45 +00:00
|
|
|
test_commit "$@" 1>&2 &&
|
2019-12-04 22:03:24 +00:00
|
|
|
incrpackid=$(git pack-objects --all --unpacked --incremental .git/objects/pack/pack </dev/null) &&
|
builtin/repack.c: only collect fully-formed packs
To partition the set of packs based on which ones are "kept" (either
they have a .keep file, or were otherwise marked via the `--keep-pack`
option) and "non-kept" ones (anything else), `git repack` uses its
`collect_pack_filenames()` function.
Ordinarily, we would rely on a convenience function such as
`get_all_packs()` to enumerate and partition the set of packs. But
`collect_pack_filenames()` uses `readdir()` directly to read the
contents of the "$GIT_DIR/objects/pack" directory, and adds each entry
ending in ".pack" to the appropriate list (either kept, or non-kept as
above).
This is subtly racy, since `collect_pack_filenames()` may see a pack
that is not fully staged (i.e., it is missing its ".idx" file).
Ordinarily, this doesn't cause a problem. But it can cause issues when
generating a cruft pack.
This is because `git repack` feeds (among other things) the list of
existing kept packs down to `git pack-objects --cruft` to indicate that
any kept packs will not be removed from the repository (so that the
cruft pack machinery can avoid packing objects that appear in those
packs as cruft).
But `read_cruft_objects()` lists packfiles by calling `get_all_packs()`.
So if a ".pack" file exists (necessary to get that pack to appear to
`collect_pack_filenames()`), but doesn't have a corresponding ".idx"
file (necessary to get that pack to appear via `get_all_packs()`), we'll
complain with:
fatal: could not find pack '.tmp-5841-pack-a6b0150558609c323c496ced21de6f4b66589260.pack'
Fix the above by teaching `collect_pack_filenames()` to only collect
packs with their corresponding `*.idx` files in place, indicating that
those packs have been fully staged.
There are a couple of things worth noting:
- Since each entry in the `extra_keep` list (which contains the
`--keep-pack` names) has a `*.pack` suffix, we'll have to swap the
suffix from ".pack" to ".idx", and compare that instead.
- Since we use the the `fname_kept_list` to figure out which packs to
delete (with `git repack -d`), we would have previously deleted a
`*.pack` with no index (since the existince of a ".pack" file is
necessary and sufficient to include that pack in the list of
existing non-kept packs).
Now we will leave it alone (since that pack won't appear in the
list). This is far more correct behavior, since we don't want
to race with a pack being staged. Deleting a partially staged pack
is unlikely, however, since the window of time between staging a
pack and moving its .idx file into place is miniscule.
Note that this window does *not* include the time it takes to
receive and index the pack, since the incoming data goes into
"$GIT_DIR/objects/tmp_pack_XXXXXX", which does not end in ".pack"
and is thus ignored by collect_pack_filenames().
In the future, this function should probably be rewritten as a callback
to `for_each_file_in_pack_dir()`, but this is the simplest change we
could do in the short-term.
Reported-by: Michael Haggerty <mhagger@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-07 10:16:17 +00:00
|
|
|
# Remove any loose object(s) created by test_commit, since they have
|
|
|
|
# already been packed. Leaving these around can create subtly different
|
|
|
|
# packs with `pack-objects`'s `--unpacked` option.
|
|
|
|
git prune-packed 1>&2 &&
|
2019-12-04 22:03:24 +00:00
|
|
|
echo pack-${incrpackid}.pack
|
2018-04-15 15:36:13 +00:00
|
|
|
}
|
|
|
|
|
2019-12-04 22:03:09 +00:00
|
|
|
test_no_missing_in_packs () {
|
|
|
|
myidx=$(ls -1 .git/objects/pack/*.idx) &&
|
|
|
|
test_path_is_file "$myidx" &&
|
|
|
|
git verify-pack -v alt_objects/pack/*.idx >orig.raw &&
|
2019-12-04 22:03:24 +00:00
|
|
|
sed -n -e "s/^\($OID_REGEX\).*/\1/p" orig.raw | sort >orig &&
|
2019-12-04 22:03:09 +00:00
|
|
|
git verify-pack -v $myidx >dest.raw &&
|
|
|
|
cut -d" " -f1 dest.raw | sort >dest &&
|
|
|
|
comm -23 orig dest >missing &&
|
|
|
|
test_must_be_empty missing
|
|
|
|
}
|
|
|
|
|
2019-12-04 22:03:24 +00:00
|
|
|
# we expect $packid and $oid to be defined
|
2019-12-04 22:03:14 +00:00
|
|
|
test_has_duplicate_object () {
|
|
|
|
want_duplicate_object="$1"
|
|
|
|
found_duplicate_object=false
|
|
|
|
for p in .git/objects/pack/*.idx
|
|
|
|
do
|
|
|
|
idx=$(basename $p)
|
2019-12-04 22:03:24 +00:00
|
|
|
test "pack-$packid.idx" = "$idx" && continue
|
2019-12-04 22:03:14 +00:00
|
|
|
git verify-pack -v $p >packlist || return $?
|
2019-12-04 22:03:24 +00:00
|
|
|
if grep "^$oid" packlist
|
2019-12-04 22:03:14 +00:00
|
|
|
then
|
|
|
|
found_duplicate_object=true
|
|
|
|
echo "DUPLICATE OBJECT FOUND"
|
|
|
|
break
|
|
|
|
fi
|
|
|
|
done &&
|
|
|
|
test "$want_duplicate_object" = "$found_duplicate_object"
|
|
|
|
}
|
|
|
|
|
2008-11-12 17:59:05 +00:00
|
|
|
test_expect_success 'objects in packs marked .keep are not repacked' '
|
2019-11-27 19:53:47 +00:00
|
|
|
echo content1 >file1 &&
|
|
|
|
echo content2 >file2 &&
|
2008-11-12 17:59:02 +00:00
|
|
|
git add . &&
|
2010-04-14 22:09:57 +00:00
|
|
|
test_tick &&
|
2008-11-12 17:59:02 +00:00
|
|
|
git commit -m initial_commit &&
|
|
|
|
# Create two packs
|
|
|
|
# The first pack will contain all of the objects except one
|
2019-12-04 22:03:30 +00:00
|
|
|
git rev-list --objects --all >objs &&
|
|
|
|
grep -v file2 objs | git pack-objects pack &&
|
2008-11-12 17:59:02 +00:00
|
|
|
# The second pack will contain the excluded object
|
2019-12-04 22:03:30 +00:00
|
|
|
packid=$(grep file2 objs | git pack-objects pack) &&
|
2019-12-04 22:03:24 +00:00
|
|
|
>pack-$packid.keep &&
|
2019-12-04 22:03:30 +00:00
|
|
|
git verify-pack -v pack-$packid.idx >packlist &&
|
|
|
|
oid=$(head -n 1 packlist | sed -e "s/^\($OID_REGEX\).*/\1/") &&
|
2008-11-12 17:59:02 +00:00
|
|
|
mv pack-* .git/objects/pack/ &&
|
2014-06-11 06:32:45 +00:00
|
|
|
git repack -A -d -l &&
|
2008-11-12 17:59:02 +00:00
|
|
|
git prune-packed &&
|
2019-12-04 22:03:14 +00:00
|
|
|
test_has_duplicate_object false
|
2008-11-12 17:59:02 +00:00
|
|
|
'
|
|
|
|
|
2014-06-10 20:09:23 +00:00
|
|
|
test_expect_success 'writing bitmaps via command-line can duplicate .keep objects' '
|
2019-12-04 22:03:24 +00:00
|
|
|
# build on $oid, $packid, and .keep state from previous
|
2021-08-31 20:52:41 +00:00
|
|
|
GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 git repack -Adbl &&
|
2019-12-04 22:03:14 +00:00
|
|
|
test_has_duplicate_object true
|
repack: add `repack.packKeptObjects` config var
The git-repack command always passes `--honor-pack-keep`
to pack-objects. This has traditionally been a good thing,
as we do not want to duplicate those objects in a new pack,
and we are not going to delete the old pack.
However, when bitmaps are in use, it is important for a full
repack to include all reachable objects, even if they may be
duplicated in a .keep pack. Otherwise, we cannot generate
the bitmaps, as the on-disk format requires the set of
objects in the pack to be fully closed.
Even if the repository does not generally have .keep files,
a simultaneous push could cause a race condition in which a
.keep file exists at the moment of a repack. The repack may
try to include those objects in one of two situations:
1. The pushed .keep pack contains objects that were
already in the repository (e.g., blobs due to a revert of
an old commit).
2. Receive-pack updates the refs, making the objects
reachable, but before it removes the .keep file, the
repack runs.
In either case, we may prefer to duplicate some objects in
the new, full pack, and let the next repack (after the .keep
file is cleaned up) take care of removing them.
This patch introduces both a command-line and config option
to disable the `--honor-pack-keep` option. By default, it
is triggered when pack.writeBitmaps (or `--write-bitmap-index`
is turned on), but specifying it explicitly can override the
behavior (e.g., in cases where you prefer .keep files to
bitmaps, but only when they are present).
Note that this option just disables the pack-objects
behavior. We still leave packs with a .keep in place, as we
do not necessarily know that we have duplicated all of their
objects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-03 20:04:20 +00:00
|
|
|
'
|
|
|
|
|
2014-06-10 20:09:23 +00:00
|
|
|
test_expect_success 'writing bitmaps via config can duplicate .keep objects' '
|
2019-12-04 22:03:24 +00:00
|
|
|
# build on $oid, $packid, and .keep state from previous
|
2021-08-31 20:52:41 +00:00
|
|
|
GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
|
|
|
|
git -c repack.writebitmaps=true repack -Adl &&
|
2019-12-04 22:03:14 +00:00
|
|
|
test_has_duplicate_object true
|
repack: add `repack.packKeptObjects` config var
The git-repack command always passes `--honor-pack-keep`
to pack-objects. This has traditionally been a good thing,
as we do not want to duplicate those objects in a new pack,
and we are not going to delete the old pack.
However, when bitmaps are in use, it is important for a full
repack to include all reachable objects, even if they may be
duplicated in a .keep pack. Otherwise, we cannot generate
the bitmaps, as the on-disk format requires the set of
objects in the pack to be fully closed.
Even if the repository does not generally have .keep files,
a simultaneous push could cause a race condition in which a
.keep file exists at the moment of a repack. The repack may
try to include those objects in one of two situations:
1. The pushed .keep pack contains objects that were
already in the repository (e.g., blobs due to a revert of
an old commit).
2. Receive-pack updates the refs, making the objects
reachable, but before it removes the .keep file, the
repack runs.
In either case, we may prefer to duplicate some objects in
the new, full pack, and let the next repack (after the .keep
file is cleaned up) take care of removing them.
This patch introduces both a command-line and config option
to disable the `--honor-pack-keep` option. By default, it
is triggered when pack.writeBitmaps (or `--write-bitmap-index`
is turned on), but specifying it explicitly can override the
behavior (e.g., in cases where you prefer .keep files to
bitmaps, but only when they are present).
Note that this option just disables the pack-objects
behavior. We still leave packs with a .keep in place, as we
do not necessarily know that we have duplicated all of their
objects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-03 20:04:20 +00:00
|
|
|
'
|
|
|
|
|
2008-11-10 05:59:58 +00:00
|
|
|
test_expect_success 'loose objects in alternate ODB are not repacked' '
|
2008-11-10 05:59:56 +00:00
|
|
|
mkdir alt_objects &&
|
2019-11-27 19:53:47 +00:00
|
|
|
echo $(pwd)/alt_objects >.git/objects/info/alternates &&
|
|
|
|
echo content3 >file3 &&
|
2019-12-04 22:03:24 +00:00
|
|
|
oid=$(GIT_OBJECT_DIRECTORY=alt_objects git hash-object -w file3) &&
|
2008-11-10 05:59:56 +00:00
|
|
|
git add file3 &&
|
2010-04-14 22:09:57 +00:00
|
|
|
test_tick &&
|
2008-11-10 05:59:56 +00:00
|
|
|
git commit -m commit_file3 &&
|
|
|
|
git repack -a -d -l &&
|
|
|
|
git prune-packed &&
|
2019-12-04 22:03:14 +00:00
|
|
|
test_has_duplicate_object false
|
2008-11-10 05:59:56 +00:00
|
|
|
'
|
|
|
|
|
object-file: use real paths when adding alternates
When adding an alternate ODB, we check if the alternate has the same
path as the object dir, and if so, we do nothing. However, that
comparison does not resolve symlinks. This makes it possible to add the
object dir as an alternate, which may result in bad behavior. For
example, it can trick "git repack -a -l -d" (possibly run by "git gc")
into thinking that all packs come from an alternate and delete all
objects.
rm -rf test &&
git clone https://github.com/git/git test &&
(
cd test &&
ln -s objects .git/alt-objects &&
# -c repack.updateserverinfo=false silences a warning about not
# being able to update "info/refs", it isn't needed to show the
# bad behavior
GIT_ALTERNATE_OBJECT_DIRECTORIES=".git/alt-objects" git \
-c repack.updateserverinfo=false repack -a -l -d &&
# It's broken!
git status
# Because there are no more objects!
ls .git/objects/pack
)
Fix this by resolving symlinks and relative paths before comparing the
alternate and object dir. This lets us clean up a number of issues noted
in 37a95862c6 (alternates: re-allow relative paths from environment,
2016-11-07):
- Now that we compare the real paths, duplicate detection is no longer
foiled by relative paths.
- Using strbuf_realpath() allows us to "normalize" paths that
strbuf_normalize_path() can't, so we can stop silently ignoring errors
when "normalizing" paths from the environment.
- We now store an absolute path based on getcwd() (the "future
direction" named in 37a95862c6), so chdir()-ing in the process no
longer changes the directory pointed to by the alternate. This is a
change in behavior, but a desirable one.
Signed-off-by: Glen Choo <chooglen@google.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-24 00:55:31 +00:00
|
|
|
test_expect_success SYMLINKS '--local keeps packs when alternate is objectdir ' '
|
|
|
|
test_when_finished "rm -rf repo" &&
|
|
|
|
git init repo &&
|
|
|
|
test_commit -C repo A &&
|
|
|
|
(
|
|
|
|
cd repo &&
|
|
|
|
git repack -a &&
|
|
|
|
ls .git/objects/pack/*.pack >../expect &&
|
|
|
|
ln -s objects .git/alt_objects &&
|
|
|
|
echo "$(pwd)/.git/alt_objects" >.git/objects/info/alternates &&
|
|
|
|
git repack -a -d -l &&
|
|
|
|
ls .git/objects/pack/*.pack >../actual
|
|
|
|
) &&
|
|
|
|
test_cmp expect actual
|
|
|
|
'
|
|
|
|
|
2023-04-14 06:02:12 +00:00
|
|
|
test_expect_success '--local disables writing bitmaps when connected to alternate ODB' '
|
|
|
|
test_when_finished "rm -rf shared member" &&
|
|
|
|
|
|
|
|
git init shared &&
|
|
|
|
git clone --shared shared member &&
|
|
|
|
(
|
|
|
|
cd member &&
|
|
|
|
test_commit "object" &&
|
|
|
|
GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adl --write-bitmap-index 2>err &&
|
|
|
|
cat >expect <<-EOF &&
|
|
|
|
warning: disabling bitmap writing, as some objects are not being packed
|
|
|
|
EOF
|
|
|
|
test_cmp expect err &&
|
|
|
|
test_path_is_missing .git/objects/pack-*.bitmap
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2008-11-13 00:50:26 +00:00
|
|
|
test_expect_success 'packed obs in alt ODB are repacked even when local repo is packless' '
|
2010-10-31 07:30:58 +00:00
|
|
|
mkdir alt_objects/pack &&
|
2008-11-13 00:50:26 +00:00
|
|
|
mv .git/objects/pack/* alt_objects/pack &&
|
|
|
|
git repack -a &&
|
2019-12-04 22:03:09 +00:00
|
|
|
test_no_missing_in_packs
|
2008-11-13 00:50:26 +00:00
|
|
|
'
|
|
|
|
|
2009-04-24 23:18:53 +00:00
|
|
|
test_expect_success 'packed obs in alt ODB are repacked when local repo has packs' '
|
t7700: demonstrate misbehavior of 'repack -a' when local packs exist
The ability to "...fatten [the] local repository by packing everything that
is needed by the local ref into a single new pack, including things that are
borrowed from alternates"[1] is supposed to be provided by the '-a' or '-A'
options to repack when '-l' is not used, but there is a flaw. For each
pack in the local repository without a .keep file, repack supplies a
--unpacked=<pack> argument to pack-objects.
The --unpacked option to pack-objects, with or without an argument, causes
pack-objects to ignore any object which is packed in a pack not mentioned
in an argument to --unpacked=. So, if there are local packs, and
'repack -a' is called, then any objects which reside in packs accessible
through alternates will _not_ be packed. If there are no local packs, then
no --unpacked argument will be supplied, and repack will behave as expected.
[1] http://mid.gmane.org/7v8wrwidi3.fsf@gitster.siamese.dyndns.org
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-09 22:14:39 +00:00
|
|
|
rm -f .git/objects/pack/* &&
|
2019-11-27 19:53:47 +00:00
|
|
|
echo new_content >>file1 &&
|
t7700: demonstrate misbehavior of 'repack -a' when local packs exist
The ability to "...fatten [the] local repository by packing everything that
is needed by the local ref into a single new pack, including things that are
borrowed from alternates"[1] is supposed to be provided by the '-a' or '-A'
options to repack when '-l' is not used, but there is a flaw. For each
pack in the local repository without a .keep file, repack supplies a
--unpacked=<pack> argument to pack-objects.
The --unpacked option to pack-objects, with or without an argument, causes
pack-objects to ignore any object which is packed in a pack not mentioned
in an argument to --unpacked=. So, if there are local packs, and
'repack -a' is called, then any objects which reside in packs accessible
through alternates will _not_ be packed. If there are no local packs, then
no --unpacked argument will be supplied, and repack will behave as expected.
[1] http://mid.gmane.org/7v8wrwidi3.fsf@gitster.siamese.dyndns.org
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-09 22:14:39 +00:00
|
|
|
git add file1 &&
|
2010-04-14 22:09:57 +00:00
|
|
|
test_tick &&
|
t7700: demonstrate misbehavior of 'repack -a' when local packs exist
The ability to "...fatten [the] local repository by packing everything that
is needed by the local ref into a single new pack, including things that are
borrowed from alternates"[1] is supposed to be provided by the '-a' or '-A'
options to repack when '-l' is not used, but there is a flaw. For each
pack in the local repository without a .keep file, repack supplies a
--unpacked=<pack> argument to pack-objects.
The --unpacked option to pack-objects, with or without an argument, causes
pack-objects to ignore any object which is packed in a pack not mentioned
in an argument to --unpacked=. So, if there are local packs, and
'repack -a' is called, then any objects which reside in packs accessible
through alternates will _not_ be packed. If there are no local packs, then
no --unpacked argument will be supplied, and repack will behave as expected.
[1] http://mid.gmane.org/7v8wrwidi3.fsf@gitster.siamese.dyndns.org
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-09 22:14:39 +00:00
|
|
|
git commit -m more_content &&
|
|
|
|
git repack &&
|
|
|
|
git repack -a -d &&
|
2019-12-04 22:03:09 +00:00
|
|
|
test_no_missing_in_packs
|
t7700: demonstrate misbehavior of 'repack -a' when local packs exist
The ability to "...fatten [the] local repository by packing everything that
is needed by the local ref into a single new pack, including things that are
borrowed from alternates"[1] is supposed to be provided by the '-a' or '-A'
options to repack when '-l' is not used, but there is a flaw. For each
pack in the local repository without a .keep file, repack supplies a
--unpacked=<pack> argument to pack-objects.
The --unpacked option to pack-objects, with or without an argument, causes
pack-objects to ignore any object which is packed in a pack not mentioned
in an argument to --unpacked=. So, if there are local packs, and
'repack -a' is called, then any objects which reside in packs accessible
through alternates will _not_ be packed. If there are no local packs, then
no --unpacked argument will be supplied, and repack will behave as expected.
[1] http://mid.gmane.org/7v8wrwidi3.fsf@gitster.siamese.dyndns.org
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-09 22:14:39 +00:00
|
|
|
'
|
|
|
|
|
2009-03-20 03:47:51 +00:00
|
|
|
test_expect_success 'packed obs in alternate ODB kept pack are repacked' '
|
2009-03-20 03:47:50 +00:00
|
|
|
# swap the .keep so the commit object is in the pack with .keep
|
|
|
|
for p in alt_objects/pack/*.pack
|
|
|
|
do
|
2010-10-31 07:30:58 +00:00
|
|
|
base_name=$(basename $p .pack) &&
|
2019-11-27 19:53:52 +00:00
|
|
|
if test_path_is_file alt_objects/pack/$base_name.keep
|
2009-03-20 03:47:50 +00:00
|
|
|
then
|
|
|
|
rm alt_objects/pack/$base_name.keep
|
|
|
|
else
|
|
|
|
touch alt_objects/pack/$base_name.keep
|
2021-12-09 05:11:15 +00:00
|
|
|
fi || return 1
|
2010-10-31 07:30:58 +00:00
|
|
|
done &&
|
2009-03-20 03:47:50 +00:00
|
|
|
git repack -a -d &&
|
2019-12-04 22:03:09 +00:00
|
|
|
test_no_missing_in_packs
|
2009-03-20 03:47:50 +00:00
|
|
|
'
|
|
|
|
|
2009-03-20 03:47:52 +00:00
|
|
|
test_expect_success 'packed unreachable obs in alternate ODB are not loosened' '
|
2009-03-20 03:47:50 +00:00
|
|
|
rm -f alt_objects/pack/*.keep &&
|
|
|
|
mv .git/objects/pack/* alt_objects/pack/ &&
|
2019-12-04 22:03:24 +00:00
|
|
|
coid=$(git rev-parse HEAD^{commit}) &&
|
2009-03-20 03:47:50 +00:00
|
|
|
git reset --hard HEAD^ &&
|
2010-04-14 22:09:57 +00:00
|
|
|
test_tick &&
|
|
|
|
git reflog expire --expire=$test_tick --expire-unreachable=$test_tick --all &&
|
2009-03-20 03:47:50 +00:00
|
|
|
# The pack-objects call on the next line is equivalent to
|
|
|
|
# git repack -A -d without the call to prune-packed
|
|
|
|
git pack-objects --honor-pack-keep --non-empty --all --reflog \
|
|
|
|
--unpack-unreachable </dev/null pack &&
|
|
|
|
rm -f .git/objects/pack/* &&
|
|
|
|
mv pack-* .git/objects/pack/ &&
|
2019-12-04 22:03:30 +00:00
|
|
|
git verify-pack -v -- .git/objects/pack/*.idx >packlist &&
|
|
|
|
! grep "^$coid " packlist &&
|
2019-11-27 19:53:47 +00:00
|
|
|
echo >.git/objects/info/alternates &&
|
2019-12-04 22:03:24 +00:00
|
|
|
test_must_fail git show $coid
|
2009-03-20 03:47:50 +00:00
|
|
|
'
|
|
|
|
|
2009-03-21 22:26:11 +00:00
|
|
|
test_expect_success 'local packed unreachable obs that exist in alternate ODB are not loosened' '
|
2019-11-27 19:53:47 +00:00
|
|
|
echo $(pwd)/alt_objects >.git/objects/info/alternates &&
|
2019-12-04 22:03:24 +00:00
|
|
|
echo "$coid" | git pack-objects --non-empty --all --reflog pack &&
|
2009-03-21 22:25:30 +00:00
|
|
|
rm -f .git/objects/pack/* &&
|
|
|
|
mv pack-* .git/objects/pack/ &&
|
|
|
|
# The pack-objects call on the next line is equivalent to
|
|
|
|
# git repack -A -d without the call to prune-packed
|
|
|
|
git pack-objects --honor-pack-keep --non-empty --all --reflog \
|
|
|
|
--unpack-unreachable </dev/null pack &&
|
|
|
|
rm -f .git/objects/pack/* &&
|
|
|
|
mv pack-* .git/objects/pack/ &&
|
2019-12-04 22:03:30 +00:00
|
|
|
git verify-pack -v -- .git/objects/pack/*.idx >packlist &&
|
|
|
|
! grep "^$coid " &&
|
2019-11-27 19:53:47 +00:00
|
|
|
echo >.git/objects/info/alternates &&
|
2019-12-04 22:03:24 +00:00
|
|
|
test_must_fail git show $coid
|
2009-03-21 22:25:30 +00:00
|
|
|
'
|
|
|
|
|
2009-07-23 15:33:49 +00:00
|
|
|
test_expect_success 'objects made unreachable by grafts only are kept' '
|
2009-07-23 15:33:45 +00:00
|
|
|
test_tick &&
|
|
|
|
git commit --allow-empty -m "commit 4" &&
|
|
|
|
H0=$(git rev-parse HEAD) &&
|
|
|
|
H1=$(git rev-parse HEAD^) &&
|
|
|
|
H2=$(git rev-parse HEAD^^) &&
|
2019-11-27 19:53:47 +00:00
|
|
|
echo "$H0 $H2" >.git/info/grafts &&
|
2010-04-14 22:09:57 +00:00
|
|
|
git reflog expire --expire=$test_tick --expire-unreachable=$test_tick --all &&
|
2009-07-23 15:33:45 +00:00
|
|
|
git repack -a -d &&
|
|
|
|
git cat-file -t $H1
|
2018-04-15 15:36:12 +00:00
|
|
|
'
|
2009-07-23 15:33:45 +00:00
|
|
|
|
2018-04-15 15:36:13 +00:00
|
|
|
test_expect_success 'repack --keep-pack' '
|
|
|
|
test_create_repo keep-pack &&
|
|
|
|
(
|
|
|
|
cd keep-pack &&
|
builtin/repack.c: only repack `.pack`s that exist
In 73320e49add (builtin/repack.c: only collect fully-formed packs,
2023-06-07), we switched the check for which packs to collect by
starting at the .idx files and looking for matching .pack files. This
avoids trying to repack pack-files that have not had their pack-indexes
installed yet.
However, it does cause maintenance to halt if we find the (problematic,
but not insurmountable) case of a .idx file without a corresponding
.pack file. In an environment where packfile maintenance is a critical
function, such a hard stop is costly and requires human intervention to
resolve (by deleting the .idx file).
This was not the case before. We successfully repacked through this
scenario until the recent change to scan for .idx files.
Further, if we are actually in a case where objects are missing, we
detect this at a different point during the reachability walk.
In other cases, Git prepares its list of packfiles by scanning .idx
files and then only adds it to the packfile list if the corresponding
.pack file exists. It even does so without a warning! (See
add_packed_git() in packfile.c for details.)
This case is much less likely to occur than the failures seen before
73320e49add. Packfiles are "installed" by writing the .pack file before
the .idx and that process can be interrupted. Packfiles _should_ be
deleted by deleting the .idx first, followed by the .pack file, but
unlink_pack_path() does not do this: it deletes the .pack _first_,
allowing a window where this process could be interrupted. We leave the
consideration of changing this order as a separate concern. Knowing that
this condition is possible from interrupted Git processes and not other
tools lends some weight that Git should be more flexible around this
scenario.
Add a check to see if the .pack file exists before adding it to the list
for repacking. This will stop a number of maintenance failures seen in
production but fixed by deleting the .idx files.
This brings us closer to the case before 73320e49add in that 'git
repack' will not fail when there is an orphaned .idx file, at least, not
due to the way we scan for packfiles. In the case that the .pack file
was erroneously deleted without copies of its objects in other installed
packfiles, then 'git repack' will fail due to the reachable object walk.
This does resolve the case where automated repacks will no longer be
halted on this case. The tests in t7700 show both these successful
scenarios and the case of failing if the .pack was truly required.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-11 17:32:34 +00:00
|
|
|
# avoid producing different packs due to delta/base choices
|
builtin/repack.c: only collect fully-formed packs
To partition the set of packs based on which ones are "kept" (either
they have a .keep file, or were otherwise marked via the `--keep-pack`
option) and "non-kept" ones (anything else), `git repack` uses its
`collect_pack_filenames()` function.
Ordinarily, we would rely on a convenience function such as
`get_all_packs()` to enumerate and partition the set of packs. But
`collect_pack_filenames()` uses `readdir()` directly to read the
contents of the "$GIT_DIR/objects/pack" directory, and adds each entry
ending in ".pack" to the appropriate list (either kept, or non-kept as
above).
This is subtly racy, since `collect_pack_filenames()` may see a pack
that is not fully staged (i.e., it is missing its ".idx" file).
Ordinarily, this doesn't cause a problem. But it can cause issues when
generating a cruft pack.
This is because `git repack` feeds (among other things) the list of
existing kept packs down to `git pack-objects --cruft` to indicate that
any kept packs will not be removed from the repository (so that the
cruft pack machinery can avoid packing objects that appear in those
packs as cruft).
But `read_cruft_objects()` lists packfiles by calling `get_all_packs()`.
So if a ".pack" file exists (necessary to get that pack to appear to
`collect_pack_filenames()`), but doesn't have a corresponding ".idx"
file (necessary to get that pack to appear via `get_all_packs()`), we'll
complain with:
fatal: could not find pack '.tmp-5841-pack-a6b0150558609c323c496ced21de6f4b66589260.pack'
Fix the above by teaching `collect_pack_filenames()` to only collect
packs with their corresponding `*.idx` files in place, indicating that
those packs have been fully staged.
There are a couple of things worth noting:
- Since each entry in the `extra_keep` list (which contains the
`--keep-pack` names) has a `*.pack` suffix, we'll have to swap the
suffix from ".pack" to ".idx", and compare that instead.
- Since we use the the `fname_kept_list` to figure out which packs to
delete (with `git repack -d`), we would have previously deleted a
`*.pack` with no index (since the existince of a ".pack" file is
necessary and sufficient to include that pack in the list of
existing non-kept packs).
Now we will leave it alone (since that pack won't appear in the
list). This is far more correct behavior, since we don't want
to race with a pack being staged. Deleting a partially staged pack
is unlikely, however, since the window of time between staging a
pack and moving its .idx file into place is miniscule.
Note that this window does *not* include the time it takes to
receive and index the pack, since the incoming data goes into
"$GIT_DIR/objects/tmp_pack_XXXXXX", which does not end in ".pack"
and is thus ignored by collect_pack_filenames().
In the future, this function should probably be rewritten as a callback
to `for_each_file_in_pack_dir()`, but this is the simplest change we
could do in the short-term.
Reported-by: Michael Haggerty <mhagger@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-07 10:16:17 +00:00
|
|
|
git config pack.window 0 &&
|
2018-04-15 15:36:13 +00:00
|
|
|
P1=$(commit_and_pack 1) &&
|
|
|
|
P2=$(commit_and_pack 2) &&
|
|
|
|
P3=$(commit_and_pack 3) &&
|
|
|
|
P4=$(commit_and_pack 4) &&
|
|
|
|
ls .git/objects/pack/*.pack >old-counts &&
|
|
|
|
test_line_count = 4 old-counts &&
|
|
|
|
git repack -a -d --keep-pack $P1 --keep-pack $P4 &&
|
|
|
|
ls .git/objects/pack/*.pack >new-counts &&
|
|
|
|
grep -q $P1 new-counts &&
|
|
|
|
grep -q $P4 new-counts &&
|
|
|
|
test_line_count = 3 new-counts &&
|
builtin/repack.c: only collect fully-formed packs
To partition the set of packs based on which ones are "kept" (either
they have a .keep file, or were otherwise marked via the `--keep-pack`
option) and "non-kept" ones (anything else), `git repack` uses its
`collect_pack_filenames()` function.
Ordinarily, we would rely on a convenience function such as
`get_all_packs()` to enumerate and partition the set of packs. But
`collect_pack_filenames()` uses `readdir()` directly to read the
contents of the "$GIT_DIR/objects/pack" directory, and adds each entry
ending in ".pack" to the appropriate list (either kept, or non-kept as
above).
This is subtly racy, since `collect_pack_filenames()` may see a pack
that is not fully staged (i.e., it is missing its ".idx" file).
Ordinarily, this doesn't cause a problem. But it can cause issues when
generating a cruft pack.
This is because `git repack` feeds (among other things) the list of
existing kept packs down to `git pack-objects --cruft` to indicate that
any kept packs will not be removed from the repository (so that the
cruft pack machinery can avoid packing objects that appear in those
packs as cruft).
But `read_cruft_objects()` lists packfiles by calling `get_all_packs()`.
So if a ".pack" file exists (necessary to get that pack to appear to
`collect_pack_filenames()`), but doesn't have a corresponding ".idx"
file (necessary to get that pack to appear via `get_all_packs()`), we'll
complain with:
fatal: could not find pack '.tmp-5841-pack-a6b0150558609c323c496ced21de6f4b66589260.pack'
Fix the above by teaching `collect_pack_filenames()` to only collect
packs with their corresponding `*.idx` files in place, indicating that
those packs have been fully staged.
There are a couple of things worth noting:
- Since each entry in the `extra_keep` list (which contains the
`--keep-pack` names) has a `*.pack` suffix, we'll have to swap the
suffix from ".pack" to ".idx", and compare that instead.
- Since we use the the `fname_kept_list` to figure out which packs to
delete (with `git repack -d`), we would have previously deleted a
`*.pack` with no index (since the existince of a ".pack" file is
necessary and sufficient to include that pack in the list of
existing non-kept packs).
Now we will leave it alone (since that pack won't appear in the
list). This is far more correct behavior, since we don't want
to race with a pack being staged. Deleting a partially staged pack
is unlikely, however, since the window of time between staging a
pack and moving its .idx file into place is miniscule.
Note that this window does *not* include the time it takes to
receive and index the pack, since the incoming data goes into
"$GIT_DIR/objects/tmp_pack_XXXXXX", which does not end in ".pack"
and is thus ignored by collect_pack_filenames().
In the future, this function should probably be rewritten as a callback
to `for_each_file_in_pack_dir()`, but this is the simplest change we
could do in the short-term.
Reported-by: Michael Haggerty <mhagger@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-07 10:16:17 +00:00
|
|
|
git fsck &&
|
|
|
|
|
|
|
|
P5=$(commit_and_pack --no-tag 5) &&
|
|
|
|
git reset --hard HEAD^ &&
|
|
|
|
git reflog expire --all --expire=all &&
|
|
|
|
rm -f ".git/objects/pack/${P5%.pack}.idx" &&
|
|
|
|
rm -f ".git/objects/info/commit-graph" &&
|
|
|
|
for from in $(find .git/objects/pack -type f -name "${P5%.pack}.*")
|
|
|
|
do
|
|
|
|
to="$(dirname "$from")/.tmp-1234-$(basename "$from")" &&
|
|
|
|
mv "$from" "$to" || return 1
|
|
|
|
done &&
|
|
|
|
|
builtin/repack.c: only repack `.pack`s that exist
In 73320e49add (builtin/repack.c: only collect fully-formed packs,
2023-06-07), we switched the check for which packs to collect by
starting at the .idx files and looking for matching .pack files. This
avoids trying to repack pack-files that have not had their pack-indexes
installed yet.
However, it does cause maintenance to halt if we find the (problematic,
but not insurmountable) case of a .idx file without a corresponding
.pack file. In an environment where packfile maintenance is a critical
function, such a hard stop is costly and requires human intervention to
resolve (by deleting the .idx file).
This was not the case before. We successfully repacked through this
scenario until the recent change to scan for .idx files.
Further, if we are actually in a case where objects are missing, we
detect this at a different point during the reachability walk.
In other cases, Git prepares its list of packfiles by scanning .idx
files and then only adds it to the packfile list if the corresponding
.pack file exists. It even does so without a warning! (See
add_packed_git() in packfile.c for details.)
This case is much less likely to occur than the failures seen before
73320e49add. Packfiles are "installed" by writing the .pack file before
the .idx and that process can be interrupted. Packfiles _should_ be
deleted by deleting the .idx first, followed by the .pack file, but
unlink_pack_path() does not do this: it deletes the .pack _first_,
allowing a window where this process could be interrupted. We leave the
consideration of changing this order as a separate concern. Knowing that
this condition is possible from interrupted Git processes and not other
tools lends some weight that Git should be more flexible around this
scenario.
Add a check to see if the .pack file exists before adding it to the list
for repacking. This will stop a number of maintenance failures seen in
production but fixed by deleting the .idx files.
This brings us closer to the case before 73320e49add in that 'git
repack' will not fail when there is an orphaned .idx file, at least, not
due to the way we scan for packfiles. In the case that the .pack file
was erroneously deleted without copies of its objects in other installed
packfiles, then 'git repack' will fail due to the reachable object walk.
This does resolve the case where automated repacks will no longer be
halted on this case. The tests in t7700 show both these successful
scenarios and the case of failing if the .pack was truly required.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-11 17:32:34 +00:00
|
|
|
# A .idx file without a .pack should not stop us from
|
|
|
|
# repacking what we can.
|
|
|
|
touch .git/objects/pack/pack-does-not-exist.idx &&
|
|
|
|
|
builtin/repack.c: only collect fully-formed packs
To partition the set of packs based on which ones are "kept" (either
they have a .keep file, or were otherwise marked via the `--keep-pack`
option) and "non-kept" ones (anything else), `git repack` uses its
`collect_pack_filenames()` function.
Ordinarily, we would rely on a convenience function such as
`get_all_packs()` to enumerate and partition the set of packs. But
`collect_pack_filenames()` uses `readdir()` directly to read the
contents of the "$GIT_DIR/objects/pack" directory, and adds each entry
ending in ".pack" to the appropriate list (either kept, or non-kept as
above).
This is subtly racy, since `collect_pack_filenames()` may see a pack
that is not fully staged (i.e., it is missing its ".idx" file).
Ordinarily, this doesn't cause a problem. But it can cause issues when
generating a cruft pack.
This is because `git repack` feeds (among other things) the list of
existing kept packs down to `git pack-objects --cruft` to indicate that
any kept packs will not be removed from the repository (so that the
cruft pack machinery can avoid packing objects that appear in those
packs as cruft).
But `read_cruft_objects()` lists packfiles by calling `get_all_packs()`.
So if a ".pack" file exists (necessary to get that pack to appear to
`collect_pack_filenames()`), but doesn't have a corresponding ".idx"
file (necessary to get that pack to appear via `get_all_packs()`), we'll
complain with:
fatal: could not find pack '.tmp-5841-pack-a6b0150558609c323c496ced21de6f4b66589260.pack'
Fix the above by teaching `collect_pack_filenames()` to only collect
packs with their corresponding `*.idx` files in place, indicating that
those packs have been fully staged.
There are a couple of things worth noting:
- Since each entry in the `extra_keep` list (which contains the
`--keep-pack` names) has a `*.pack` suffix, we'll have to swap the
suffix from ".pack" to ".idx", and compare that instead.
- Since we use the the `fname_kept_list` to figure out which packs to
delete (with `git repack -d`), we would have previously deleted a
`*.pack` with no index (since the existince of a ".pack" file is
necessary and sufficient to include that pack in the list of
existing non-kept packs).
Now we will leave it alone (since that pack won't appear in the
list). This is far more correct behavior, since we don't want
to race with a pack being staged. Deleting a partially staged pack
is unlikely, however, since the window of time between staging a
pack and moving its .idx file into place is miniscule.
Note that this window does *not* include the time it takes to
receive and index the pack, since the incoming data goes into
"$GIT_DIR/objects/tmp_pack_XXXXXX", which does not end in ".pack"
and is thus ignored by collect_pack_filenames().
In the future, this function should probably be rewritten as a callback
to `for_each_file_in_pack_dir()`, but this is the simplest change we
could do in the short-term.
Reported-by: Michael Haggerty <mhagger@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-07 10:16:17 +00:00
|
|
|
git repack --cruft -d --keep-pack $P1 --keep-pack $P4 &&
|
|
|
|
|
|
|
|
ls .git/objects/pack/*.pack >newer-counts &&
|
|
|
|
test_cmp new-counts newer-counts &&
|
2018-04-15 15:36:13 +00:00
|
|
|
git fsck
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
builtin/repack.c: only repack `.pack`s that exist
In 73320e49add (builtin/repack.c: only collect fully-formed packs,
2023-06-07), we switched the check for which packs to collect by
starting at the .idx files and looking for matching .pack files. This
avoids trying to repack pack-files that have not had their pack-indexes
installed yet.
However, it does cause maintenance to halt if we find the (problematic,
but not insurmountable) case of a .idx file without a corresponding
.pack file. In an environment where packfile maintenance is a critical
function, such a hard stop is costly and requires human intervention to
resolve (by deleting the .idx file).
This was not the case before. We successfully repacked through this
scenario until the recent change to scan for .idx files.
Further, if we are actually in a case where objects are missing, we
detect this at a different point during the reachability walk.
In other cases, Git prepares its list of packfiles by scanning .idx
files and then only adds it to the packfile list if the corresponding
.pack file exists. It even does so without a warning! (See
add_packed_git() in packfile.c for details.)
This case is much less likely to occur than the failures seen before
73320e49add. Packfiles are "installed" by writing the .pack file before
the .idx and that process can be interrupted. Packfiles _should_ be
deleted by deleting the .idx first, followed by the .pack file, but
unlink_pack_path() does not do this: it deletes the .pack _first_,
allowing a window where this process could be interrupted. We leave the
consideration of changing this order as a separate concern. Knowing that
this condition is possible from interrupted Git processes and not other
tools lends some weight that Git should be more flexible around this
scenario.
Add a check to see if the .pack file exists before adding it to the list
for repacking. This will stop a number of maintenance failures seen in
production but fixed by deleting the .idx files.
This brings us closer to the case before 73320e49add in that 'git
repack' will not fail when there is an orphaned .idx file, at least, not
due to the way we scan for packfiles. In the case that the .pack file
was erroneously deleted without copies of its objects in other installed
packfiles, then 'git repack' will fail due to the reachable object walk.
This does resolve the case where automated repacks will no longer be
halted on this case. The tests in t7700 show both these successful
scenarios and the case of failing if the .pack was truly required.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-11 17:32:34 +00:00
|
|
|
test_expect_success 'repacking fails when missing .pack actually means missing objects' '
|
|
|
|
test_create_repo idx-without-pack &&
|
|
|
|
(
|
|
|
|
cd idx-without-pack &&
|
|
|
|
|
|
|
|
# Avoid producing different packs due to delta/base choices
|
|
|
|
git config pack.window 0 &&
|
|
|
|
P1=$(commit_and_pack 1) &&
|
|
|
|
P2=$(commit_and_pack 2) &&
|
|
|
|
P3=$(commit_and_pack 3) &&
|
|
|
|
P4=$(commit_and_pack 4) &&
|
|
|
|
ls .git/objects/pack/*.pack >old-counts &&
|
|
|
|
test_line_count = 4 old-counts &&
|
|
|
|
|
|
|
|
# Remove one .pack file
|
|
|
|
rm .git/objects/pack/$P2 &&
|
|
|
|
|
|
|
|
ls .git/objects/pack/*.pack >before-pack-dir &&
|
|
|
|
|
|
|
|
test_must_fail git fsck &&
|
commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default
In 7a5d604443 (commit: detect commits that exist in commit-graph but not
in the ODB, 2023-10-31), we have introduced a new object existence check
into `repo_parse_commit_internal()` so that we do not parse commits via
the commit-graph that don't have a corresponding object in the object
database. This new check of course comes with a performance penalty,
which the commit put at around 30% for `git rev-list --topo-order`. But
there are in fact scenarios where the performance regression is even
higher. The following benchmark against linux.git with a fully-build
commit-graph:
Benchmark 1: git.v2.42.1 rev-list --count HEAD
Time (mean ± σ): 658.0 ms ± 5.2 ms [User: 613.5 ms, System: 44.4 ms]
Range (min … max): 650.2 ms … 666.0 ms 10 runs
Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD
Time (mean ± σ): 1.333 s ± 0.019 s [User: 1.263 s, System: 0.069 s]
Range (min … max): 1.302 s … 1.361 s 10 runs
Summary
git.v2.42.1 rev-list --count HEAD ran
2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD
While it's a noble goal to ensure that results are the same regardless
of whether or not we have a potentially stale commit-graph, taking twice
as much time is a tough sell. Furthermore, we can generally assume that
the commit-graph will be updated by git-gc(1) or git-maintenance(1) as
required so that the case where the commit-graph is stale should not at
all be common.
With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore
the behaviour and thus performance previous to the mentioned commit. In
order to not be inconsistent, also disable this behaviour by default in
`lookup_commit_in_graph()`, where the object existence check has been
introduced right at its inception via f559d6d45e (revision: avoid
hitting packfiles when commits are in commit-graph, 2021-08-09).
This results in another speedup in commands that end up calling this
function, even though it's less pronounced compared to the above
benchmark. The following has been executed in linux.git with ~1.2
million references:
Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted
Time (mean ± σ): 2.947 s ± 0.003 s [User: 2.412 s, System: 0.534 s]
Range (min … max): 2.943 s … 2.949 s 3 runs
Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted
Time (mean ± σ): 2.724 s ± 0.030 s [User: 2.207 s, System: 0.514 s]
Range (min … max): 2.704 s … 2.759 s 3 runs
Summary
GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran
1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted
So whereas 7a5d604443 initially introduced the logic to start doing an
object existence check in `repo_parse_commit_internal()` by default, the
updated logic will now instead cause `lookup_commit_in_graph()` to stop
doing the check by default. This behaviour continues to be tweakable by
the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable.
Note that this requires us to amend some tests to manually turn on the
paranoid checks again. This is because we cause repository corruption by
manually deleting objects which are part of the commit graph already.
These circumstances shouldn't usually happen in repositories.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-24 11:08:21 +00:00
|
|
|
test_must_fail env GIT_COMMIT_GRAPH_PARANOIA=true git repack --cruft -d 2>err &&
|
builtin/repack.c: only repack `.pack`s that exist
In 73320e49add (builtin/repack.c: only collect fully-formed packs,
2023-06-07), we switched the check for which packs to collect by
starting at the .idx files and looking for matching .pack files. This
avoids trying to repack pack-files that have not had their pack-indexes
installed yet.
However, it does cause maintenance to halt if we find the (problematic,
but not insurmountable) case of a .idx file without a corresponding
.pack file. In an environment where packfile maintenance is a critical
function, such a hard stop is costly and requires human intervention to
resolve (by deleting the .idx file).
This was not the case before. We successfully repacked through this
scenario until the recent change to scan for .idx files.
Further, if we are actually in a case where objects are missing, we
detect this at a different point during the reachability walk.
In other cases, Git prepares its list of packfiles by scanning .idx
files and then only adds it to the packfile list if the corresponding
.pack file exists. It even does so without a warning! (See
add_packed_git() in packfile.c for details.)
This case is much less likely to occur than the failures seen before
73320e49add. Packfiles are "installed" by writing the .pack file before
the .idx and that process can be interrupted. Packfiles _should_ be
deleted by deleting the .idx first, followed by the .pack file, but
unlink_pack_path() does not do this: it deletes the .pack _first_,
allowing a window where this process could be interrupted. We leave the
consideration of changing this order as a separate concern. Knowing that
this condition is possible from interrupted Git processes and not other
tools lends some weight that Git should be more flexible around this
scenario.
Add a check to see if the .pack file exists before adding it to the list
for repacking. This will stop a number of maintenance failures seen in
production but fixed by deleting the .idx files.
This brings us closer to the case before 73320e49add in that 'git
repack' will not fail when there is an orphaned .idx file, at least, not
due to the way we scan for packfiles. In the case that the .pack file
was erroneously deleted without copies of its objects in other installed
packfiles, then 'git repack' will fail due to the reachable object walk.
This does resolve the case where automated repacks will no longer be
halted on this case. The tests in t7700 show both these successful
scenarios and the case of failing if the .pack was truly required.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-11 17:32:34 +00:00
|
|
|
grep "bad object" err &&
|
|
|
|
|
|
|
|
# Before failing, the repack did not modify the
|
|
|
|
# pack directory.
|
|
|
|
ls .git/objects/pack/*.pack >after-pack-dir &&
|
|
|
|
test_cmp before-pack-dir after-pack-dir
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2019-03-14 09:12:54 +00:00
|
|
|
test_expect_success 'bitmaps are created by default in bare repos' '
|
|
|
|
git clone --bare .git bare.git &&
|
2021-08-31 20:52:41 +00:00
|
|
|
rm -f bare.git/objects/pack/*.bitmap &&
|
|
|
|
GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
|
|
|
|
git -C bare.git repack -ad &&
|
2019-03-14 09:12:54 +00:00
|
|
|
bitmap=$(ls bare.git/objects/pack/*.bitmap) &&
|
|
|
|
test_path_is_file "$bitmap"
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'incremental repack does not complain' '
|
|
|
|
git -C bare.git repack -q 2>repack.err &&
|
|
|
|
test_must_be_empty repack.err
|
|
|
|
'
|
2008-11-12 17:59:02 +00:00
|
|
|
|
2019-03-14 09:12:54 +00:00
|
|
|
test_expect_success 'bitmaps can be disabled on bare repos' '
|
2021-08-31 20:52:41 +00:00
|
|
|
GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
|
|
|
|
git -c repack.writeBitmaps=false -C bare.git repack -ad &&
|
2019-11-27 19:53:45 +00:00
|
|
|
bitmap=$(ls bare.git/objects/pack/*.bitmap || :) &&
|
2019-03-14 09:12:54 +00:00
|
|
|
test -z "$bitmap"
|
|
|
|
'
|
|
|
|
|
2019-06-29 19:13:59 +00:00
|
|
|
test_expect_success 'no bitmaps created if .keep files present' '
|
|
|
|
pack=$(ls bare.git/objects/pack/*.pack) &&
|
|
|
|
test_path_is_file "$pack" &&
|
|
|
|
keep=${pack%.pack}.keep &&
|
2019-07-31 05:37:36 +00:00
|
|
|
test_when_finished "rm -f \"\$keep\"" &&
|
2019-06-29 19:13:59 +00:00
|
|
|
>"$keep" &&
|
2021-08-31 20:52:41 +00:00
|
|
|
GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
|
|
|
|
git -C bare.git repack -ad 2>stderr &&
|
2019-07-31 05:40:56 +00:00
|
|
|
test_must_be_empty stderr &&
|
2019-06-29 19:13:59 +00:00
|
|
|
find bare.git/objects/pack/ -type f -name "*.bitmap" >actual &&
|
|
|
|
test_must_be_empty actual
|
|
|
|
'
|
|
|
|
|
2019-07-31 05:39:27 +00:00
|
|
|
test_expect_success 'auto-bitmaps do not complain if unavailable' '
|
|
|
|
test_config -C bare.git pack.packSizeLimit 1M &&
|
|
|
|
blob=$(test-tool genrandom big $((1024*1024)) |
|
|
|
|
git -C bare.git hash-object -w --stdin) &&
|
|
|
|
git -C bare.git update-ref refs/tags/big $blob &&
|
2021-08-31 20:52:41 +00:00
|
|
|
GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
|
|
|
|
git -C bare.git repack -ad 2>stderr &&
|
2019-07-31 05:39:27 +00:00
|
|
|
test_must_be_empty stderr &&
|
|
|
|
find bare.git/objects/pack -type f -name "*.bitmap" >actual &&
|
|
|
|
test_must_be_empty actual
|
|
|
|
'
|
|
|
|
|
repack: add `--filter=<filter-spec>` option
This new option puts the objects specified by `<filter-spec>` into a
separate packfile.
This could be useful if, for example, some blobs take up a lot of
precious space on fast storage while they are rarely accessed. It could
make sense to move them into a separate cheaper, though slower, storage.
It's possible to find which new packfile contains the filtered out
objects using one of the following:
- `git verify-pack -v ...`,
- `test-tool find-pack ...`, which a previous commit added,
- `--filter-to=<dir>`, which a following commit will add to specify
where the pack containing the filtered out objects will be.
This feature is implemented by running `git pack-objects` twice in a
row. The first command is run with `--filter=<filter-spec>`, using the
specified filter. It packs objects while omitting the objects specified
by the filter. Then another `git pack-objects` command is launched using
`--stdin-packs`. We pass it all the previously existing packs into its
stdin, so that it will pack all the objects in the previously existing
packs. But we also pass into its stdin, the pack created by the previous
`git pack-objects --filter=<filter-spec>` command as well as the kept
packs, all prefixed with '^', so that the objects in these packs will be
omitted from the resulting pack. The result is that only the objects
filtered out by the first `git pack-objects` command are in the pack
resulting from the second `git pack-objects` command.
As the interactions with kept packs are a bit tricky, a few related
tests are added.
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 16:55:01 +00:00
|
|
|
test_expect_success 'repacking with a filter works' '
|
|
|
|
git -C bare.git repack -a -d &&
|
|
|
|
test_stdout_line_count = 1 ls bare.git/objects/pack/*.pack &&
|
|
|
|
git -C bare.git -c repack.writebitmaps=false repack -a -d --filter=blob:none &&
|
|
|
|
test_stdout_line_count = 2 ls bare.git/objects/pack/*.pack &&
|
|
|
|
commit_pack=$(test-tool -C bare.git find-pack -c 1 HEAD) &&
|
|
|
|
blob_pack=$(test-tool -C bare.git find-pack -c 1 HEAD:file1) &&
|
|
|
|
test "$commit_pack" != "$blob_pack" &&
|
|
|
|
tree_pack=$(test-tool -C bare.git find-pack -c 1 HEAD^{tree}) &&
|
|
|
|
test "$tree_pack" = "$commit_pack" &&
|
|
|
|
blob_pack2=$(test-tool -C bare.git find-pack -c 1 HEAD:file2) &&
|
|
|
|
test "$blob_pack2" = "$blob_pack"
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success '--filter fails with --write-bitmap-index' '
|
|
|
|
test_must_fail \
|
|
|
|
env GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
|
|
|
|
git -C bare.git repack -a -d --write-bitmap-index --filter=blob:none
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'repacking with two filters works' '
|
|
|
|
git init two-filters &&
|
|
|
|
(
|
|
|
|
cd two-filters &&
|
|
|
|
mkdir subdir &&
|
|
|
|
test_commit foo &&
|
|
|
|
test_commit subdir_bar subdir/bar &&
|
|
|
|
test_commit subdir_baz subdir/baz
|
|
|
|
) &&
|
|
|
|
git clone --no-local --bare two-filters two-filters.git &&
|
|
|
|
(
|
|
|
|
cd two-filters.git &&
|
|
|
|
test_stdout_line_count = 1 ls objects/pack/*.pack &&
|
|
|
|
git -c repack.writebitmaps=false repack -a -d \
|
|
|
|
--filter=blob:none --filter=tree:1 &&
|
|
|
|
test_stdout_line_count = 2 ls objects/pack/*.pack &&
|
|
|
|
commit_pack=$(test-tool find-pack -c 1 HEAD) &&
|
|
|
|
blob_pack=$(test-tool find-pack -c 1 HEAD:foo.t) &&
|
|
|
|
root_tree_pack=$(test-tool find-pack -c 1 HEAD^{tree}) &&
|
|
|
|
subdir_tree_hash=$(git ls-tree --object-only HEAD -- subdir) &&
|
|
|
|
subdir_tree_pack=$(test-tool find-pack -c 1 "$subdir_tree_hash") &&
|
|
|
|
|
|
|
|
# Root tree and subdir tree are not in the same packfiles
|
|
|
|
test "$commit_pack" != "$blob_pack" &&
|
|
|
|
test "$commit_pack" = "$root_tree_pack" &&
|
|
|
|
test "$blob_pack" = "$subdir_tree_pack"
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
|
|
|
prepare_for_keep_packs () {
|
|
|
|
git init keep-packs &&
|
|
|
|
(
|
|
|
|
cd keep-packs &&
|
|
|
|
test_commit foo &&
|
|
|
|
test_commit bar
|
|
|
|
) &&
|
|
|
|
git clone --no-local --bare keep-packs keep-packs.git &&
|
|
|
|
(
|
|
|
|
cd keep-packs.git &&
|
|
|
|
|
|
|
|
# Create two packs
|
|
|
|
# The first pack will contain all of the objects except one blob
|
|
|
|
git rev-list --objects --all >objs &&
|
|
|
|
grep -v "bar.t" objs | git pack-objects pack &&
|
|
|
|
# The second pack will contain the excluded object and be kept
|
|
|
|
packid=$(grep "bar.t" objs | git pack-objects pack) &&
|
|
|
|
>pack-$packid.keep &&
|
|
|
|
|
|
|
|
# Replace the existing pack with the 2 new ones
|
|
|
|
rm -f objects/pack/pack* &&
|
|
|
|
mv pack-* objects/pack/
|
|
|
|
)
|
|
|
|
}
|
|
|
|
|
|
|
|
test_expect_success '--filter works with .keep packs' '
|
|
|
|
prepare_for_keep_packs &&
|
|
|
|
(
|
|
|
|
cd keep-packs.git &&
|
|
|
|
|
|
|
|
foo_pack=$(test-tool find-pack -c 1 HEAD:foo.t) &&
|
|
|
|
bar_pack=$(test-tool find-pack -c 1 HEAD:bar.t) &&
|
|
|
|
head_pack=$(test-tool find-pack -c 1 HEAD) &&
|
|
|
|
|
|
|
|
test "$foo_pack" != "$bar_pack" &&
|
|
|
|
test "$foo_pack" = "$head_pack" &&
|
|
|
|
|
|
|
|
git -c repack.writebitmaps=false repack -a -d --filter=blob:none &&
|
|
|
|
|
|
|
|
foo_pack_1=$(test-tool find-pack -c 1 HEAD:foo.t) &&
|
|
|
|
bar_pack_1=$(test-tool find-pack -c 1 HEAD:bar.t) &&
|
|
|
|
head_pack_1=$(test-tool find-pack -c 1 HEAD) &&
|
|
|
|
|
|
|
|
# Object bar is still only in the old .keep pack
|
|
|
|
test "$foo_pack_1" != "$foo_pack" &&
|
|
|
|
test "$bar_pack_1" = "$bar_pack" &&
|
|
|
|
test "$head_pack_1" != "$head_pack" &&
|
|
|
|
|
|
|
|
test "$foo_pack_1" != "$bar_pack_1" &&
|
|
|
|
test "$foo_pack_1" != "$head_pack_1" &&
|
|
|
|
test "$bar_pack_1" != "$head_pack_1"
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success '--filter works with --pack-kept-objects and .keep packs' '
|
|
|
|
rm -rf keep-packs keep-packs.git &&
|
|
|
|
prepare_for_keep_packs &&
|
|
|
|
(
|
|
|
|
cd keep-packs.git &&
|
|
|
|
|
|
|
|
foo_pack=$(test-tool find-pack -c 1 HEAD:foo.t) &&
|
|
|
|
bar_pack=$(test-tool find-pack -c 1 HEAD:bar.t) &&
|
|
|
|
head_pack=$(test-tool find-pack -c 1 HEAD) &&
|
|
|
|
|
|
|
|
test "$foo_pack" != "$bar_pack" &&
|
|
|
|
test "$foo_pack" = "$head_pack" &&
|
|
|
|
|
|
|
|
git -c repack.writebitmaps=false repack -a -d --filter=blob:none \
|
|
|
|
--pack-kept-objects &&
|
|
|
|
|
|
|
|
foo_pack_1=$(test-tool find-pack -c 1 HEAD:foo.t) &&
|
|
|
|
test-tool find-pack -c 2 HEAD:bar.t >bar_pack_1 &&
|
|
|
|
head_pack_1=$(test-tool find-pack -c 1 HEAD) &&
|
|
|
|
|
|
|
|
test "$foo_pack_1" != "$foo_pack" &&
|
|
|
|
test "$foo_pack_1" != "$bar_pack" &&
|
|
|
|
test "$head_pack_1" != "$head_pack" &&
|
|
|
|
|
|
|
|
# Object bar is in both the old .keep pack and the new
|
|
|
|
# pack that contained the filtered out objects
|
|
|
|
grep "$bar_pack" bar_pack_1 &&
|
|
|
|
grep "$foo_pack_1" bar_pack_1 &&
|
|
|
|
test "$foo_pack_1" != "$head_pack_1"
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2023-10-02 16:55:03 +00:00
|
|
|
test_expect_success '--filter-to stores filtered out objects' '
|
|
|
|
git -C bare.git repack -a -d &&
|
|
|
|
test_stdout_line_count = 1 ls bare.git/objects/pack/*.pack &&
|
|
|
|
|
|
|
|
git init --bare filtered.git &&
|
|
|
|
git -C bare.git -c repack.writebitmaps=false repack -a -d \
|
|
|
|
--filter=blob:none \
|
|
|
|
--filter-to=../filtered.git/objects/pack/pack &&
|
|
|
|
test_stdout_line_count = 1 ls bare.git/objects/pack/pack-*.pack &&
|
|
|
|
test_stdout_line_count = 1 ls filtered.git/objects/pack/pack-*.pack &&
|
|
|
|
|
|
|
|
commit_pack=$(test-tool -C bare.git find-pack -c 1 HEAD) &&
|
|
|
|
blob_pack=$(test-tool -C bare.git find-pack -c 0 HEAD:file1) &&
|
|
|
|
blob_hash=$(git -C bare.git rev-parse HEAD:file1) &&
|
|
|
|
test -n "$blob_hash" &&
|
|
|
|
blob_pack=$(test-tool -C filtered.git find-pack -c 1 $blob_hash) &&
|
|
|
|
|
|
|
|
echo $(pwd)/filtered.git/objects >bare.git/objects/info/alternates &&
|
|
|
|
blob_pack=$(test-tool -C bare.git find-pack -c 1 HEAD:file1) &&
|
|
|
|
blob_content=$(git -C bare.git show $blob_hash) &&
|
|
|
|
test "$blob_content" = "content1"
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success '--filter works with --max-pack-size' '
|
|
|
|
rm -rf filtered.git &&
|
|
|
|
git init --bare filtered.git &&
|
|
|
|
git init max-pack-size &&
|
|
|
|
(
|
|
|
|
cd max-pack-size &&
|
|
|
|
test_commit base &&
|
|
|
|
# two blobs which exceed the maximum pack size
|
|
|
|
test-tool genrandom foo 1048576 >foo &&
|
|
|
|
git hash-object -w foo &&
|
|
|
|
test-tool genrandom bar 1048576 >bar &&
|
|
|
|
git hash-object -w bar &&
|
|
|
|
git add foo bar &&
|
|
|
|
git commit -m "adding foo and bar"
|
|
|
|
) &&
|
|
|
|
git clone --no-local --bare max-pack-size max-pack-size.git &&
|
|
|
|
(
|
|
|
|
cd max-pack-size.git &&
|
|
|
|
git -c repack.writebitmaps=false repack -a -d --filter=blob:none \
|
|
|
|
--max-pack-size=1M \
|
|
|
|
--filter-to=../filtered.git/objects/pack/pack &&
|
|
|
|
echo $(cd .. && pwd)/filtered.git/objects >objects/info/alternates &&
|
|
|
|
|
|
|
|
# Check that the 3 blobs are in different packfiles in filtered.git
|
|
|
|
test_stdout_line_count = 3 ls ../filtered.git/objects/pack/pack-*.pack &&
|
|
|
|
test_stdout_line_count = 1 ls objects/pack/pack-*.pack &&
|
|
|
|
foo_pack=$(test-tool find-pack -c 1 HEAD:foo) &&
|
|
|
|
bar_pack=$(test-tool find-pack -c 1 HEAD:bar) &&
|
|
|
|
base_pack=$(test-tool find-pack -c 1 HEAD:base.t) &&
|
|
|
|
test "$foo_pack" != "$bar_pack" &&
|
|
|
|
test "$foo_pack" != "$base_pack" &&
|
|
|
|
test "$bar_pack" != "$base_pack" &&
|
|
|
|
for pack in "$foo_pack" "$bar_pack" "$base_pack"
|
|
|
|
do
|
|
|
|
case "$foo_pack" in */filtered.git/objects/pack/*) true ;; *) return 1 ;; esac
|
|
|
|
done
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
builtin/repack.c: support writing a MIDX while repacking
Teach `git repack` a new `--write-midx` option for callers that wish to
persist a multi-pack index in their repository while repacking.
There are two existing alternatives to this new flag, but they don't
cover our particular use-case. These alternatives are:
- Call 'git multi-pack-index write' after running 'git repack', or
- Set 'GIT_TEST_MULTI_PACK_INDEX=1' in your environment when running
'git repack'.
The former works, but introduces a gap in bitmap coverage between
repacking and writing a new MIDX (since the repack may have deleted a
pack included in the existing MIDX, invalidating it altogether).
Setting the 'GIT_TEST_' environment variable is obviously unsupported.
In fact, even if it were supported officially, it still wouldn't work,
because it generates the MIDX *after* redundant packs have been dropped,
leading to the same issue as above.
Introduce a new option which eliminates this race by teaching `git
repack` to generate the MIDX at the critical point: after the new packs
have been written and moved into place, but before the redundant packs
have been removed.
This option is compatible with `git repack`'s '--bitmap' option (it
changes the interpretation to be: "write a bitmap corresponding to the
MIDX after one has been generated").
There is a little bit of additional noise in the patch below to avoid
repeating ourselves when selecting which packs to delete. Instead of a
single loop as before (where we iterate over 'existing_packs', decide if
a pack is worth deleting, and if so, delete it), we have two loops (the
first where we decide which ones are worth deleting, and the second
where we actually do the deleting). This makes it so we have a single
check we can make consistently when (1) telling the MIDX which packs we
want to exclude, and (2) actually unlinking the redundant packs.
There is also a tiny change to short-circuit the body of
write_midx_included_packs() when no packs remain in the case of an empty
repository. The MIDX code does not handle this, so avoid trying to
generate a MIDX covering zero packs in the first place.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-29 01:55:18 +00:00
|
|
|
objdir=.git/objects
|
|
|
|
midx=$objdir/pack/multi-pack-index
|
|
|
|
|
|
|
|
test_expect_success 'setup for --write-midx tests' '
|
|
|
|
git init midx &&
|
|
|
|
(
|
|
|
|
cd midx &&
|
|
|
|
git config core.multiPackIndex true &&
|
|
|
|
|
|
|
|
test_commit base
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success '--write-midx unchanged' '
|
|
|
|
(
|
|
|
|
cd midx &&
|
|
|
|
GIT_TEST_MULTI_PACK_INDEX=0 git repack &&
|
|
|
|
test_path_is_missing $midx &&
|
|
|
|
test_path_is_missing $midx-*.bitmap &&
|
|
|
|
|
|
|
|
GIT_TEST_MULTI_PACK_INDEX=0 git repack --write-midx &&
|
|
|
|
|
|
|
|
test_path_is_file $midx &&
|
|
|
|
test_path_is_missing $midx-*.bitmap &&
|
|
|
|
test_midx_consistent $objdir
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success '--write-midx with a new pack' '
|
|
|
|
(
|
|
|
|
cd midx &&
|
|
|
|
test_commit loose &&
|
|
|
|
|
|
|
|
GIT_TEST_MULTI_PACK_INDEX=0 git repack --write-midx &&
|
|
|
|
|
|
|
|
test_path_is_file $midx &&
|
|
|
|
test_path_is_missing $midx-*.bitmap &&
|
|
|
|
test_midx_consistent $objdir
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success '--write-midx with -b' '
|
|
|
|
(
|
|
|
|
cd midx &&
|
|
|
|
GIT_TEST_MULTI_PACK_INDEX=0 git repack -mb &&
|
|
|
|
|
|
|
|
test_path_is_file $midx &&
|
|
|
|
test_path_is_file $midx-*.bitmap &&
|
|
|
|
test_midx_consistent $objdir
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success '--write-midx with -d' '
|
|
|
|
(
|
|
|
|
cd midx &&
|
|
|
|
test_commit repack &&
|
|
|
|
|
|
|
|
GIT_TEST_MULTI_PACK_INDEX=0 git repack -Ad --write-midx &&
|
|
|
|
|
|
|
|
test_path_is_file $midx &&
|
|
|
|
test_path_is_missing $midx-*.bitmap &&
|
|
|
|
test_midx_consistent $objdir
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'cleans up MIDX when appropriate' '
|
|
|
|
(
|
|
|
|
cd midx &&
|
|
|
|
|
|
|
|
test_commit repack-2 &&
|
|
|
|
GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb --write-midx &&
|
|
|
|
|
|
|
|
checksum=$(midx_checksum $objdir) &&
|
|
|
|
test_path_is_file $midx &&
|
|
|
|
test_path_is_file $midx-$checksum.bitmap &&
|
|
|
|
|
|
|
|
test_commit repack-3 &&
|
|
|
|
GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb --write-midx &&
|
|
|
|
|
|
|
|
test_path_is_file $midx &&
|
|
|
|
test_path_is_missing $midx-$checksum.bitmap &&
|
|
|
|
test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
|
|
|
|
|
|
|
|
test_commit repack-4 &&
|
|
|
|
GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb &&
|
|
|
|
|
|
|
|
find $objdir/pack -type f -name "multi-pack-index*" >files &&
|
|
|
|
test_must_be_empty files
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2021-10-01 22:38:10 +00:00
|
|
|
test_expect_success '--write-midx with preferred bitmap tips' '
|
|
|
|
git init midx-preferred-tips &&
|
|
|
|
test_when_finished "rm -fr midx-preferred-tips" &&
|
|
|
|
(
|
|
|
|
cd midx-preferred-tips &&
|
|
|
|
|
|
|
|
test_commit_bulk --message="%s" 103 &&
|
|
|
|
|
|
|
|
git log --format="%H" >commits.raw &&
|
|
|
|
sort <commits.raw >commits &&
|
|
|
|
|
|
|
|
git log --format="create refs/tags/%s/%s %H" HEAD >refs &&
|
|
|
|
git update-ref --stdin <refs &&
|
|
|
|
|
t/t7700-repack.sh: fix test breakages with `GIT_TEST_MULTI_PACK_INDEX=1 `
There are a handful of related test breakages which are found when
running t/t7700-repack.sh with GIT_TEST_MULTI_PACK_INDEX set to "1" in
your environment.
Both test failures are the result of something like:
git repack --write-midx --write-bitmap-index [...] &&
test_path_is_file $midx &&
test_path_is_file $midx-$(midx_checksum $objdir).bitmap
, where we repack instructing Git to write a new MIDX and corresponding
MIDX bitamp.
The error occurs when GIT_TEST_MULTI_PACK_INDEX=1 is found in the
enviornment. This causes Git to write out a second MIDX (after
processing the builtin's `--write-midx` argument) which is identical to
the first, but does not request a bitmap (since we did not set the
GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP variable in the environment).
Since c528e179662 (pack-bitmap: write multi-pack bitmaps, 2021-08-31),
the MIDX machinery will drop an existing MIDX bitmap when rewriting an
identical MIDX which does not itself request a corresponding bitmap,
which is similar to the way repack itself behaves in the pack-bitmap
case.
Correct these issues (which date back to [1] and [2], respectively) by
explicitly setting GIT_TEST_MULTI_PACK_INDEX to zero before running each
command.
In the future, we should consider removing GIT_TEST_MULTI_PACK_INDEX,
and in general clean up unused GIT_TEST_-variables. But that is a larger
effort, and this ensures that we can cleanly run:
$ GIT_TEST_MULTI_PACK_INDEX=1 make test
in the meantime.
[1]: 324efc90d1b (builtin/repack.c: pass `--refs-snapshot` when writing
bitmaps, 2021-10-01)
[2]: 197443e80ab (repack: don't remove .keep packs with
`--pack-kept-objects`, 2022-10-17).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-02 16:26:34 +00:00
|
|
|
GIT_TEST_MULTI_PACK_INDEX=0 \
|
2021-10-01 22:38:10 +00:00
|
|
|
git repack --write-midx --write-bitmap-index &&
|
|
|
|
test_path_is_file $midx &&
|
|
|
|
test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
|
|
|
|
|
|
|
|
test-tool bitmap list-commits | sort >bitmaps &&
|
|
|
|
comm -13 bitmaps commits >before &&
|
|
|
|
test_line_count = 1 before &&
|
|
|
|
|
|
|
|
rm -fr $midx-$(midx_checksum $objdir).bitmap &&
|
|
|
|
rm -fr $midx &&
|
|
|
|
|
|
|
|
# instead of constructing the snapshot ourselves (c.f., the test
|
|
|
|
# "write a bitmap with --refs-snapshot (preferred tips)" in
|
|
|
|
# t5326), mark the missing commit as preferred by adding it to
|
|
|
|
# the pack.preferBitmapTips configuration.
|
|
|
|
git for-each-ref --format="%(refname:rstrip=1)" \
|
|
|
|
--points-at="$(cat before)" >missing &&
|
|
|
|
git config pack.preferBitmapTips "$(cat missing)" &&
|
|
|
|
git repack --write-midx --write-bitmap-index &&
|
|
|
|
|
|
|
|
test-tool bitmap list-commits | sort >bitmaps &&
|
|
|
|
comm -13 bitmaps commits >after &&
|
|
|
|
|
|
|
|
! test_cmp before after
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2022-03-25 19:02:46 +00:00
|
|
|
# The first argument is expected to be a filename
|
|
|
|
# and that file should contain the name of a .idx
|
|
|
|
# file. Send the list of objects in that .idx file
|
|
|
|
# into stdout.
|
|
|
|
get_sorted_objects_from_pack () {
|
|
|
|
git show-index <$(cat "$1") >raw &&
|
|
|
|
cut -d" " -f2 raw
|
|
|
|
}
|
|
|
|
|
2021-12-20 14:48:10 +00:00
|
|
|
test_expect_success '--write-midx -b packs non-kept objects' '
|
2022-03-25 19:02:46 +00:00
|
|
|
git init repo &&
|
|
|
|
test_when_finished "rm -fr repo" &&
|
|
|
|
(
|
|
|
|
cd repo &&
|
|
|
|
|
|
|
|
# Create a kept pack-file
|
|
|
|
test_commit base &&
|
|
|
|
git repack -ad &&
|
|
|
|
find $objdir/pack -name "*.idx" >before &&
|
|
|
|
test_line_count = 1 before &&
|
|
|
|
before_name=$(cat before) &&
|
|
|
|
>${before_name%.idx}.keep &&
|
|
|
|
|
|
|
|
# Create a non-kept pack-file
|
|
|
|
test_commit other &&
|
|
|
|
git repack &&
|
|
|
|
|
|
|
|
# Create loose objects
|
|
|
|
test_commit loose &&
|
|
|
|
|
|
|
|
# Repack everything
|
|
|
|
git repack --write-midx -a -b -d &&
|
|
|
|
|
|
|
|
# There should be two pack-files now, the
|
|
|
|
# old, kept pack and the new, non-kept pack.
|
|
|
|
find $objdir/pack -name "*.idx" | sort >after &&
|
|
|
|
test_line_count = 2 after &&
|
|
|
|
find $objdir/pack -name "*.keep" >kept &&
|
|
|
|
kept_name=$(cat kept) &&
|
|
|
|
echo ${kept_name%.keep}.idx >kept-idx &&
|
|
|
|
test_cmp before kept-idx &&
|
|
|
|
|
|
|
|
# Get object list from the kept pack.
|
|
|
|
get_sorted_objects_from_pack before >old.objects &&
|
|
|
|
|
|
|
|
# Get object list from the one non-kept pack-file
|
|
|
|
comm -13 before after >new-pack &&
|
|
|
|
test_line_count = 1 new-pack &&
|
|
|
|
get_sorted_objects_from_pack new-pack >new.objects &&
|
|
|
|
|
|
|
|
# None of the objects in the new pack should
|
|
|
|
# exist within the kept pack.
|
|
|
|
comm -12 old.objects new.objects >shared.objects &&
|
|
|
|
test_must_be_empty shared.objects
|
|
|
|
)
|
2021-12-20 14:48:10 +00:00
|
|
|
'
|
|
|
|
|
2022-10-18 02:45:12 +00:00
|
|
|
test_expect_success '--write-midx removes stale pack-based bitmaps' '
|
2023-05-20 16:13:54 +00:00
|
|
|
rm -fr repo &&
|
|
|
|
git init repo &&
|
|
|
|
test_when_finished "rm -fr repo" &&
|
|
|
|
(
|
2022-10-18 02:45:12 +00:00
|
|
|
cd repo &&
|
|
|
|
test_commit base &&
|
|
|
|
GIT_TEST_MULTI_PACK_INDEX=0 git repack -Ab &&
|
|
|
|
|
|
|
|
pack_bitmap=$(ls $objdir/pack/pack-*.bitmap) &&
|
|
|
|
test_path_is_file "$pack_bitmap" &&
|
|
|
|
|
|
|
|
test_commit tip &&
|
|
|
|
GIT_TEST_MULTI_PACK_INDEX=0 git repack -bm &&
|
|
|
|
|
|
|
|
test_path_is_file $midx &&
|
|
|
|
test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
|
|
|
|
test_path_is_missing $pack_bitmap
|
2023-05-20 16:13:54 +00:00
|
|
|
)
|
2022-10-18 02:45:12 +00:00
|
|
|
'
|
|
|
|
|
repack: don't remove .keep packs with `--pack-kept-objects`
`git repack` supports a `--pack-kept-objects` flag which more or less
translates to whether or not we pass `--honor-pack-keep` down to `git
pack-objects` when assembling a new pack.
This behavior has existed since ee34a2bead (repack: add
`repack.packKeptObjects` config var, 2014-03-03). In that commit, the
documentation was extended to say:
[...] Note that we still do not delete `.keep` packs after
`pack-objects` finishes.
Unfortunately, this is not the case when `--pack-kept-objects` is
combined with a `--geometric` repack. When doing a geometric repack, we
include `.keep` packs when enumerating available packs only when
`pack_kept_objects` is set.
So this all works fine when `--no-pack-kept-objects` (or similar) is
given. Kept packs are excluded from the geometric roll-up, so when we go
to delete redundant packs (with `-d`), no `.keep` packs appear "below
the split" in our geometric progression.
But when `--pack-kept-objects` is given, things can go awry. Namely,
when a kept pack is included in the list of packs tracked by the
`pack_geometry` struct *and* part of the pack roll-up, we will delete
the `.keep` pack when we shouldn't.
Note that this *doesn't* result in object corruption, since the `.keep`
pack's objects are still present in the new pack. But the `.keep` pack
itself is removed, which violates our promise from back in ee34a2bead.
But there's more. Because `repack` computes the geometric roll-up
independently from selecting which packs belong in a MIDX (with
`--write-midx`), this can lead to odd behavior. Consider when a `.keep`
pack appears below the geometric split (ie., its objects will be part of
the new pack we generate).
We'll write a MIDX containing the new pack along with the existing
`.keep` pack. But because the `.keep` pack appears below the geometric
split line, we'll (incorrectly) try to remove it. While this doesn't
corrupt the repository, it does cause us to remove the MIDX we just
wrote, since removing that pack would invalidate the new MIDX.
Funny enough, this behavior became far less noticeable after e4d0c11c04
(repack: respect kept objects with '--write-midx -b', 2021-12-20), which
made `pack_kept_objects` be enabled by default only when we were writing
a non-MIDX bitmap.
But e4d0c11c04 didn't resolve this bug, it just made it harder to notice
unless callers explicitly passed `--pack-kept-objects`.
The solution is to avoid trying to remove `.keep` packs during
`--geometric` repacks, even when they appear below the geometric split
line, which is the approach this patch implements.
Co-authored-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-18 02:26:06 +00:00
|
|
|
test_expect_success '--write-midx with --pack-kept-objects' '
|
|
|
|
git init repo &&
|
|
|
|
test_when_finished "rm -fr repo" &&
|
|
|
|
(
|
|
|
|
cd repo &&
|
|
|
|
|
|
|
|
test_commit one &&
|
|
|
|
test_commit two &&
|
|
|
|
|
|
|
|
one="$(echo "one" | git pack-objects --revs $objdir/pack/pack)" &&
|
|
|
|
two="$(echo "one..two" | git pack-objects --revs $objdir/pack/pack)" &&
|
|
|
|
|
|
|
|
keep="$objdir/pack/pack-$one.keep" &&
|
|
|
|
touch "$keep" &&
|
|
|
|
|
t/t7700-repack.sh: fix test breakages with `GIT_TEST_MULTI_PACK_INDEX=1 `
There are a handful of related test breakages which are found when
running t/t7700-repack.sh with GIT_TEST_MULTI_PACK_INDEX set to "1" in
your environment.
Both test failures are the result of something like:
git repack --write-midx --write-bitmap-index [...] &&
test_path_is_file $midx &&
test_path_is_file $midx-$(midx_checksum $objdir).bitmap
, where we repack instructing Git to write a new MIDX and corresponding
MIDX bitamp.
The error occurs when GIT_TEST_MULTI_PACK_INDEX=1 is found in the
enviornment. This causes Git to write out a second MIDX (after
processing the builtin's `--write-midx` argument) which is identical to
the first, but does not request a bitmap (since we did not set the
GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP variable in the environment).
Since c528e179662 (pack-bitmap: write multi-pack bitmaps, 2021-08-31),
the MIDX machinery will drop an existing MIDX bitmap when rewriting an
identical MIDX which does not itself request a corresponding bitmap,
which is similar to the way repack itself behaves in the pack-bitmap
case.
Correct these issues (which date back to [1] and [2], respectively) by
explicitly setting GIT_TEST_MULTI_PACK_INDEX to zero before running each
command.
In the future, we should consider removing GIT_TEST_MULTI_PACK_INDEX,
and in general clean up unused GIT_TEST_-variables. But that is a larger
effort, and this ensures that we can cleanly run:
$ GIT_TEST_MULTI_PACK_INDEX=1 make test
in the meantime.
[1]: 324efc90d1b (builtin/repack.c: pass `--refs-snapshot` when writing
bitmaps, 2021-10-01)
[2]: 197443e80ab (repack: don't remove .keep packs with
`--pack-kept-objects`, 2022-10-17).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-02 16:26:34 +00:00
|
|
|
GIT_TEST_MULTI_PACK_INDEX=0 \
|
repack: don't remove .keep packs with `--pack-kept-objects`
`git repack` supports a `--pack-kept-objects` flag which more or less
translates to whether or not we pass `--honor-pack-keep` down to `git
pack-objects` when assembling a new pack.
This behavior has existed since ee34a2bead (repack: add
`repack.packKeptObjects` config var, 2014-03-03). In that commit, the
documentation was extended to say:
[...] Note that we still do not delete `.keep` packs after
`pack-objects` finishes.
Unfortunately, this is not the case when `--pack-kept-objects` is
combined with a `--geometric` repack. When doing a geometric repack, we
include `.keep` packs when enumerating available packs only when
`pack_kept_objects` is set.
So this all works fine when `--no-pack-kept-objects` (or similar) is
given. Kept packs are excluded from the geometric roll-up, so when we go
to delete redundant packs (with `-d`), no `.keep` packs appear "below
the split" in our geometric progression.
But when `--pack-kept-objects` is given, things can go awry. Namely,
when a kept pack is included in the list of packs tracked by the
`pack_geometry` struct *and* part of the pack roll-up, we will delete
the `.keep` pack when we shouldn't.
Note that this *doesn't* result in object corruption, since the `.keep`
pack's objects are still present in the new pack. But the `.keep` pack
itself is removed, which violates our promise from back in ee34a2bead.
But there's more. Because `repack` computes the geometric roll-up
independently from selecting which packs belong in a MIDX (with
`--write-midx`), this can lead to odd behavior. Consider when a `.keep`
pack appears below the geometric split (ie., its objects will be part of
the new pack we generate).
We'll write a MIDX containing the new pack along with the existing
`.keep` pack. But because the `.keep` pack appears below the geometric
split line, we'll (incorrectly) try to remove it. While this doesn't
corrupt the repository, it does cause us to remove the MIDX we just
wrote, since removing that pack would invalidate the new MIDX.
Funny enough, this behavior became far less noticeable after e4d0c11c04
(repack: respect kept objects with '--write-midx -b', 2021-12-20), which
made `pack_kept_objects` be enabled by default only when we were writing
a non-MIDX bitmap.
But e4d0c11c04 didn't resolve this bug, it just made it harder to notice
unless callers explicitly passed `--pack-kept-objects`.
The solution is to avoid trying to remove `.keep` packs during
`--geometric` repacks, even when they appear below the geometric split
line, which is the approach this patch implements.
Co-authored-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-18 02:26:06 +00:00
|
|
|
git repack --write-midx --write-bitmap-index --geometric=2 -d \
|
|
|
|
--pack-kept-objects &&
|
|
|
|
|
|
|
|
test_path_is_file $keep &&
|
|
|
|
test_path_is_file $midx &&
|
|
|
|
test_path_is_file $midx-$(midx_checksum $objdir).bitmap
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2021-12-20 14:48:11 +00:00
|
|
|
test_expect_success TTY '--quiet disables progress' '
|
|
|
|
test_terminal env GIT_PROGRESS_DELAY=0 \
|
|
|
|
git -C midx repack -ad --quiet --write-midx 2>stderr &&
|
|
|
|
test_must_be_empty stderr
|
|
|
|
'
|
|
|
|
|
repack: use tempfiles for signal cleanup
When git-repack exits due to a signal, it tries to clean up by calling
its remove_temporary_files() function, which walks through the packs dir
looking for ".tmp-$$-pack-*" files to delete (where "$$" is the pid of
the current process).
The biggest problem here is that remove_temporary_files() is not safe to
call in a signal handler. It uses opendir(), which isn't on the POSIX
async-signal-safe list. The details will be platform-specific, but a
likely issue is that it needs to allocate memory; if we receive a signal
while inside malloc(), etc, we'll conflict on the allocator lock and
deadlock with ourselves.
We can fix this by just cleaning up the files directly, without walking
the directory. We already know the complete list of .tmp-* files that
were generated, because we recorded them via populate_pack_exts(). When
we find files there, we can use register_tempfile() to record the
filenames. If we receive a signal, then the tempfile API will clean them
up for us, and it's async-safe and pretty battle-tested.
Note that this is slightly racier than the existing scheme. We don't
record the filenames until pack-objects tells us the hash over stdout.
So during the period between it generating the file and reporting the
hash, we'd fail to clean up. However, that period is very small. During
most of the pack generation process pack-objects is using its own
internal tempfiles. It's only at the very end that it moves them into
the names git-repack expects, and then it immediately reports the name
to us. Given that cleanup like this is best effort (after all, we may
get SIGKILL), this level of race is acceptable.
When we register the tempfiles, we'll record them locally and use the
result to call rename_tempfile(), rather than renaming by hand. This
isn't strictly necessary, as once we've renamed the files they're gone,
and the tempfile API's cleanup unlink() would simply become a pointless
noop. But managing the lifetimes of the tempfile objects is the cleanest
thing to do, and the tempfile pointers naturally fill the same role as
the old booleans.
This patch also fixes another small problem. We only hook signals, and
don't set up an atexit handler. So if we see an error that causes us to
die(), we'll leave the .tmp-* files in place. But since the tempfile API
handles this for us, this is now fixed for free. The new test covers
this by stimulating a failure of pack-objects when generating a cruft
pack. Before this patch, the .tmp-* file for the main pack would have
been left, but now we correctly clean it up.
Two small subtleties on the implementation:
- in the renaming loop, we can stop re-constructing fname_old; we only
use it when we have a tempfile to rename, so we can just ask the
tempfile for its path (which, barring bugs, should be identical)
- when renaming fails, our error message mentions fname_old. But since
a failed rename_tempfile() invalidates the tempfile struct, we'll
lose access to that string. Instead, let's mention the destination
filename, which is what most other callers do.
Reported-by: Jan Pokorný <poki@fnusa.cz>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-22 00:21:54 +00:00
|
|
|
test_expect_success 'clean up .tmp-* packs on error' '
|
2022-10-23 17:00:45 +00:00
|
|
|
test_must_fail ok=sigpipe git \
|
repack: use tempfiles for signal cleanup
When git-repack exits due to a signal, it tries to clean up by calling
its remove_temporary_files() function, which walks through the packs dir
looking for ".tmp-$$-pack-*" files to delete (where "$$" is the pid of
the current process).
The biggest problem here is that remove_temporary_files() is not safe to
call in a signal handler. It uses opendir(), which isn't on the POSIX
async-signal-safe list. The details will be platform-specific, but a
likely issue is that it needs to allocate memory; if we receive a signal
while inside malloc(), etc, we'll conflict on the allocator lock and
deadlock with ourselves.
We can fix this by just cleaning up the files directly, without walking
the directory. We already know the complete list of .tmp-* files that
were generated, because we recorded them via populate_pack_exts(). When
we find files there, we can use register_tempfile() to record the
filenames. If we receive a signal, then the tempfile API will clean them
up for us, and it's async-safe and pretty battle-tested.
Note that this is slightly racier than the existing scheme. We don't
record the filenames until pack-objects tells us the hash over stdout.
So during the period between it generating the file and reporting the
hash, we'd fail to clean up. However, that period is very small. During
most of the pack generation process pack-objects is using its own
internal tempfiles. It's only at the very end that it moves them into
the names git-repack expects, and then it immediately reports the name
to us. Given that cleanup like this is best effort (after all, we may
get SIGKILL), this level of race is acceptable.
When we register the tempfiles, we'll record them locally and use the
result to call rename_tempfile(), rather than renaming by hand. This
isn't strictly necessary, as once we've renamed the files they're gone,
and the tempfile API's cleanup unlink() would simply become a pointless
noop. But managing the lifetimes of the tempfile objects is the cleanest
thing to do, and the tempfile pointers naturally fill the same role as
the old booleans.
This patch also fixes another small problem. We only hook signals, and
don't set up an atexit handler. So if we see an error that causes us to
die(), we'll leave the .tmp-* files in place. But since the tempfile API
handles this for us, this is now fixed for free. The new test covers
this by stimulating a failure of pack-objects when generating a cruft
pack. Before this patch, the .tmp-* file for the main pack would have
been left, but now we correctly clean it up.
Two small subtleties on the implementation:
- in the renaming loop, we can stop re-constructing fname_old; we only
use it when we have a tempfile to rename, so we can just ask the
tempfile for its path (which, barring bugs, should be identical)
- when renaming fails, our error message mentions fname_old. But since
a failed rename_tempfile() invalidates the tempfile struct, we'll
lose access to that string. Instead, let's mention the destination
filename, which is what most other callers do.
Reported-by: Jan Pokorný <poki@fnusa.cz>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-22 00:21:54 +00:00
|
|
|
-c repack.cruftwindow=bogus \
|
|
|
|
repack -ad --cruft &&
|
|
|
|
find $objdir/pack -name '.tmp-*' >tmpfiles &&
|
|
|
|
test_must_be_empty tmpfiles
|
|
|
|
'
|
|
|
|
|
repack: drop remove_temporary_files()
After we've successfully finished the repack, we call
remove_temporary_files(), which looks for and removes any files matching
".tmp-$$-pack-*", where $$ is the pid of the current process. But this
is pointless. If we make it this far in the process, we've already
renamed these tempfiles into place, and there is nothing left to delete.
Nor is there a point in trying to call it to clean up when we _aren't_
successful. It's not safe for using in a signal handler, and the
previous commit already handed that job over to the tempfile API.
It might seem like it would be useful to clean up stray .tmp files left
by other invocations of git-repack. But it won't clean those files; it
only matches ones with its pid, and leaves the rest. Fortunately, those
are cleaned up naturally by successive calls to git-repack; we'll
consider .tmp-*.pack the same as normal packfiles, so "repack -ad", etc,
will roll up their contents and eventually delete them.
The one case that could matter is if pack-objects generates an extension
we don't know about, like ".tmp-pack-$$-$hash.some-new-ext". The current
code will quietly delete such a file, while after this patch we'd leave
it in place. In practice this doesn't happen, and would be indicative of
a bug. Leaving the file as cruft is arguably a better behavior, as it
means somebody is more likely to eventually notice and fix the bug. If
we really wanted to be paranoid, we could scan for and warn about such
files, but that seems like overkill.
There's nothing to test with regard to the removal of this function. It
was doing nothing, so the behavior should be the same. However, we can
verify (and protect) our assumption that "repack -ad" will eventually
remove stray files by adding a test for that.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-22 00:21:58 +00:00
|
|
|
test_expect_success 'repack -ad cleans up old .tmp-* packs' '
|
|
|
|
git rev-parse HEAD >input &&
|
|
|
|
git pack-objects $objdir/pack/.tmp-1234 <input &&
|
|
|
|
git repack -ad &&
|
|
|
|
find $objdir/pack -name '.tmp-*' >tmpfiles &&
|
|
|
|
test_must_be_empty tmpfiles
|
|
|
|
'
|
|
|
|
|
2022-03-14 07:42:46 +00:00
|
|
|
test_expect_success 'setup for update-server-info' '
|
|
|
|
git init update-server-info &&
|
|
|
|
test_commit -C update-server-info message
|
|
|
|
'
|
|
|
|
|
|
|
|
test_server_info_present () {
|
|
|
|
test_path_is_file update-server-info/.git/objects/info/packs &&
|
|
|
|
test_path_is_file update-server-info/.git/info/refs
|
|
|
|
}
|
|
|
|
|
|
|
|
test_server_info_missing () {
|
|
|
|
test_path_is_missing update-server-info/.git/objects/info/packs &&
|
|
|
|
test_path_is_missing update-server-info/.git/info/refs
|
|
|
|
}
|
|
|
|
|
|
|
|
test_server_info_cleanup () {
|
|
|
|
rm -f update-server-info/.git/objects/info/packs update-server-info/.git/info/refs &&
|
|
|
|
test_server_info_missing
|
|
|
|
}
|
|
|
|
|
|
|
|
test_expect_success 'updates server info by default' '
|
|
|
|
test_server_info_cleanup &&
|
|
|
|
git -C update-server-info repack &&
|
|
|
|
test_server_info_present
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success '-n skips updating server info' '
|
|
|
|
test_server_info_cleanup &&
|
|
|
|
git -C update-server-info repack -n &&
|
|
|
|
test_server_info_missing
|
|
|
|
'
|
|
|
|
|
2022-03-14 07:42:51 +00:00
|
|
|
test_expect_success 'repack.updateServerInfo=true updates server info' '
|
|
|
|
test_server_info_cleanup &&
|
|
|
|
git -C update-server-info -c repack.updateServerInfo=true repack &&
|
|
|
|
test_server_info_present
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'repack.updateServerInfo=false skips updating server info' '
|
|
|
|
test_server_info_cleanup &&
|
|
|
|
git -C update-server-info -c repack.updateServerInfo=false repack &&
|
|
|
|
test_server_info_missing
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success '-n overrides repack.updateServerInfo=true' '
|
|
|
|
test_server_info_cleanup &&
|
|
|
|
git -C update-server-info -c repack.updateServerInfo=true repack -n &&
|
|
|
|
test_server_info_missing
|
|
|
|
'
|
|
|
|
|
2019-03-14 09:12:54 +00:00
|
|
|
test_done
|