git/t/t5326-multi-pack-bitmaps.sh

311 lines
7.2 KiB
Bash
Raw Normal View History

#!/bin/sh
test_description='exercise basic multi-pack bitmap functionality'
. ./test-lib.sh
. "${TEST_DIRECTORY}/lib-bitmap.sh"
# We'll be writing our own midx and bitmaps, so avoid getting confused by the
# automatic ones.
GIT_TEST_MULTI_PACK_INDEX=0
GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
midx: read `RIDX` chunk when present When a MIDX contains the new `RIDX` chunk, ensure that the reverse index is read from it instead of the on-disk .rev file. Since we need to encode the object order in the MIDX itself for correctness reasons, there is no point in storing the same data again outside of the MIDX. So, this patch stops writing separate .rev files, and reads it out of the MIDX itself. This is possible to do with relatively little new code, since the format of the RIDX chunk is identical to the data in the .rev file. In other words, we can implement this by pointing the `revindex_data` field at the reverse index chunk of the MIDX instead of the .rev file without any other changes. Note that we have two knobs that are adjusted for the new tests: GIT_TEST_MIDX_WRITE_REV and GIT_TEST_MIDX_READ_RIDX. The former controls whether the MIDX .rev is written at all, and the latter controls whether we read the MIDX's RIDX chunk. Both are necessary to ensure that the test added at the beginning of this series continues to work. This is because we always need to write the RIDX chunk in the MIDX in order to change its checksum, but we want to make sure reading the existing .rev file still works (since the RIDX chunk takes precedence by default). Arguably this isn't a very interesting mode to test, because the precedence rules mean that we'll always read the RIDX chunk over the .rev file. But it makes it impossible for a user to induce corruption in their repository by adjusting the test knobs (since if we had an either/or knob they could stop writing the RIDX chunk, allowing them to tweak the MIDX's object order without changing its checksum). Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-25 22:41:17 +00:00
# This test exercise multi-pack bitmap functionality where the object order is
# stored and read from a special chunk within the MIDX, so use the default
# behavior here.
sane_unset GIT_TEST_MIDX_WRITE_REV
sane_unset GIT_TEST_MIDX_READ_RIDX
midx_bitmap_core
bitmap_reuse_tests() {
from=$1
to=$2
test_expect_success "setup pack reuse tests ($from -> $to)" '
rm -fr repo &&
git init repo &&
(
cd repo &&
test_commit_bulk 16 &&
git tag old-tip &&
git config core.multiPackIndex true &&
if test "MIDX" = "$from"
then
git repack -Ad &&
git multi-pack-index write --bitmap
else
git repack -Adb
fi
)
'
test_expect_success "build bitmap from existing ($from -> $to)" '
(
cd repo &&
test_commit_bulk --id=further 16 &&
git tag new-tip &&
if test "MIDX" = "$to"
then
git repack -d &&
git multi-pack-index write --bitmap
else
git repack -Adb
fi
)
'
test_expect_success "verify resulting bitmaps ($from -> $to)" '
(
cd repo &&
git for-each-ref &&
git rev-list --test-bitmap refs/tags/old-tip &&
git rev-list --test-bitmap refs/tags/new-tip
)
'
}
bitmap_reuse_tests 'pack' 'MIDX'
bitmap_reuse_tests 'MIDX' 'pack'
bitmap_reuse_tests 'MIDX' 'MIDX'
test_expect_success 'missing object closure fails gracefully' '
rm -fr repo &&
git init repo &&
test_when_finished "rm -fr repo" &&
(
cd repo &&
test_commit loose &&
test_commit packed &&
# Do not pass "--revs"; we want a pack without the "loose"
# commit.
git pack-objects $objdir/pack/pack <<-EOF &&
$(git rev-parse packed)
EOF
test_must_fail git multi-pack-index write --bitmap 2>err &&
grep "doesn.t have full closure" err &&
test_path_is_missing $midx
)
'
midx_bitmap_partial_tests
test_expect_success 'removing a MIDX clears stale bitmaps' '
rm -fr repo &&
git init repo &&
test_when_finished "rm -fr repo" &&
(
cd repo &&
test_commit base &&
git repack &&
git multi-pack-index write --bitmap &&
# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
rm $midx &&
# Then write a new MIDX.
test_commit new &&
git repack &&
git multi-pack-index write --bitmap &&
test_path_is_file $midx &&
test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
test_path_is_missing $stale_bitmap
)
'
test_expect_success 'pack.preferBitmapTips' '
git init repo &&
test_when_finished "rm -fr repo" &&
(
cd repo &&
test_commit_bulk --message="%s" 103 &&
git log --format="%H" >commits.raw &&
sort <commits.raw >commits &&
git log --format="create refs/tags/%s %H" HEAD >refs &&
git update-ref --stdin <refs &&
git multi-pack-index write --bitmap &&
test_path_is_file $midx &&
test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
test-tool bitmap list-commits | sort >bitmaps &&
comm -13 bitmaps commits >before &&
test_line_count = 1 before &&
perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
<before | git update-ref --stdin &&
rm -fr $midx-$(midx_checksum $objdir).bitmap &&
rm -fr $midx &&
git -c pack.preferBitmapTips=refs/tags/include \
multi-pack-index write --bitmap &&
test-tool bitmap list-commits | sort >bitmaps &&
comm -13 bitmaps commits >after &&
! test_cmp before after
)
'
midx: preliminary support for `--refs-snapshot` To figure out which commits we can write a bitmap for, the multi-pack index/bitmap code does a reachability traversal, marking any commit which can be found in the MIDX as eligible to receive a bitmap. This approach will cause a problem when multi-pack bitmaps are able to be generated from `git repack`, since the reference tips can change during the repack. Even though we ignore commits that don't exist in the MIDX (when doing a scan of the ref tips), it's possible that a commit in the MIDX reaches something that isn't. This can happen when a multi-pack index contains some pack which refers to loose objects (e.g., if a pack was pushed after starting the repack but before generating the MIDX which depends on an object which is stored as loose in the repository, and by definition isn't included in the multi-pack index). By taking a snapshot of the references before we start repacking, we can close that race window. In the above scenario (where we have a packed object pointing at a loose one), we'll either (a) take a snapshot of the references before seeing the packed one, or (b) take it after, at which point we can guarantee that the loose object will be packed and included in the MIDX. This patch does just that. It writes a temporary "reference snapshot", which is a list of OIDs that are at the ref tips before writing a multi-pack bitmap. References that are "preferred" (i.e,. are a suffix of at least one value of the 'pack.preferBitmapTips' configuration) are marked with a special '+'. The format is simple: one line per commit at each tip, with an optional '+' at the beginning (for preferred references, as described above). When provided, the reference snapshot is used to drive bitmap selection instead of the MIDX code doing its own traversal. When it isn't provided, the usual traversal takes place instead. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-29 01:55:07 +00:00
test_expect_success 'writing a bitmap with --refs-snapshot' '
git init repo &&
test_when_finished "rm -fr repo" &&
(
cd repo &&
test_commit one &&
test_commit two &&
git rev-parse one >snapshot &&
git repack -ad &&
# First, write a MIDX which see both refs/tags/one and
# refs/tags/two (causing both of those commits to receive
# bitmaps).
git multi-pack-index write --bitmap &&
test_path_is_file $midx &&
test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
test-tool bitmap list-commits | sort >bitmaps &&
grep "$(git rev-parse one)" bitmaps &&
grep "$(git rev-parse two)" bitmaps &&
rm -fr $midx-$(midx_checksum $objdir).bitmap &&
rm -fr $midx &&
# Then again, but with a refs snapshot which only sees
# refs/tags/one.
git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
test_path_is_file $midx &&
test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
test-tool bitmap list-commits | sort >bitmaps &&
grep "$(git rev-parse one)" bitmaps &&
! grep "$(git rev-parse two)" bitmaps
)
'
test_expect_success 'write a bitmap with --refs-snapshot (preferred tips)' '
git init repo &&
test_when_finished "rm -fr repo" &&
(
cd repo &&
test_commit_bulk --message="%s" 103 &&
git log --format="%H" >commits.raw &&
sort <commits.raw >commits &&
git log --format="create refs/tags/%s %H" HEAD >refs &&
git update-ref --stdin <refs &&
git multi-pack-index write --bitmap &&
test_path_is_file $midx &&
test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
test-tool bitmap list-commits | sort >bitmaps &&
comm -13 bitmaps commits >before &&
test_line_count = 1 before &&
(
grep -vf before commits.raw &&
# mark missing commits as preferred
sed "s/^/+/" before
) >snapshot &&
rm -fr $midx-$(midx_checksum $objdir).bitmap &&
rm -fr $midx &&
git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
test-tool bitmap list-commits | sort >bitmaps &&
comm -13 bitmaps commits >after &&
! test_cmp before after
)
'
test_expect_success 'hash-cache values are propagated from pack bitmaps' '
rm -fr repo &&
git init repo &&
test_when_finished "rm -fr repo" &&
(
cd repo &&
test_commit base &&
test_commit base2 &&
git repack -adb &&
test-tool bitmap dump-hashes >pack.raw &&
test_file_not_empty pack.raw &&
sort pack.raw >pack.hashes &&
test_commit new &&
git repack &&
git multi-pack-index write --bitmap &&
test-tool bitmap dump-hashes >midx.raw &&
sort midx.raw >midx.hashes &&
# ensure that every namehash in the pack bitmap can be found in
# the midx bitmap (i.e., that there are no oid-namehash pairs
# unique to the pack bitmap).
comm -23 pack.hashes midx.hashes >dropped.hashes &&
test_must_be_empty dropped.hashes
)
'
midx: prevent writing a .bitmap without any objects When trying to write a MIDX, we already prevent the case where there weren't any packs present, and thus we would have written an empty MIDX. But there is another "empty" case, which is more interesting, and we don't yet handle. If we try to write a MIDX which has at least one pack, but those packs together don't contain any objects, we will encounter a BUG() when trying to use the bitmap corresponding to that MIDX, like so: $ git rev-parse HEAD | git pack-objects --revs --use-bitmap-index --stdout >/dev/null BUG: pack-revindex.c:394: pack_pos_to_midx: out-of-bounds object at 0 (note that in the above reproduction, both `--use-bitmap-index` and `--stdout` are important, since without the former we won't even both to load the .bitmap, and without the latter we wont attempt pack reuse). The problem occurs when we try to discover the identity of the preferred pack to determine which range if any of existing packs we can reuse verbatim. This path is: `reuse_packfile_objects()` -> `reuse_partial_packfile_from_bitmap()` -> `midx_preferred_pack()`. #4 0x000055555575401f in pack_pos_to_midx (m=0x555555997160, pos=0) at pack-revindex.c:394 #5 0x00005555557502c8 in midx_preferred_pack (bitmap_git=0x55555599c280) at pack-bitmap.c:1431 #6 0x000055555575036c in reuse_partial_packfile_from_bitmap (bitmap_git=0x55555599c280, packfile_out=0x5555559666b0 <reuse_packfile>, entries=0x5555559666b8 <reuse_packfile_objects>, reuse_out=0x5555559666c0 <reuse_packfile_bitmap>) at pack-bitmap.c:1452 #7 0x00005555556041f6 in get_object_list_from_bitmap (revs=0x7fffffffcbf0) at builtin/pack-objects.c:3658 #8 0x000055555560465c in get_object_list (ac=2, av=0x555555997050) at builtin/pack-objects.c:3765 #9 0x0000555555605e4e in cmd_pack_objects (argc=0, argv=0x7fffffffe920, prefix=0x0) at builtin/pack-objects.c:4154 Since neither the .bitmap or MIDX stores the identity of the preferred pack, we infer it by trying to load the first object in pseudo-pack order, and then asking the MIDX which pack was chosen to represent that object. But this fails our bounds check, since there are zero objects in the MIDX to begin with, which results in the BUG(). We could catch this more carefully in `midx_preferred_pack()`, but signaling the absence of a preferred pack out to all of its callers is somewhat awkward. Instead, let's avoid writing a MIDX .bitmap without any objects altogether. We catch this case in `write_midx_internal()`, and emit a warning if the caller indicated they wanted to write a bitmap before clearing out the relevant flags. If we somehow got to write_midx_bitmap(), then we will call BUG(), but this should now be an unreachable path. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-09 19:26:47 +00:00
test_expect_success 'no .bitmap is written without any objects' '
rm -fr repo &&
git init repo &&
test_when_finished "rm -fr repo" &&
(
cd repo &&
empty="$(git pack-objects $objdir/pack/pack </dev/null)" &&
cat >packs <<-EOF &&
pack-$empty.idx
EOF
git multi-pack-index write --bitmap --stdin-packs \
<packs 2>err &&
grep "bitmap without any objects" err &&
test_path_is_file $midx &&
test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
)
'
pack-bitmap.c: gracefully fallback after opening pack/MIDX When opening a MIDX/pack-bitmap, we call open_midx_bitmap_1() or open_pack_bitmap_1() respectively in a loop over the set of MIDXs/packs. By design, these functions are supposed to be called over every pack and MIDX, since only one of them should have a valid bitmap. Ordinarily we return '0' from these two functions in order to indicate that we successfully loaded a bitmap To signal that we couldn't load a bitmap corresponding to the MIDX/pack (either because one doesn't exist, or because there was an error with loading it), we can return '-1'. In either case, the callers each enumerate all MIDXs/packs to ensure that at most one bitmap per-kind is present. But when we fail to load a bitmap that does exist (for example, loading a MIDX bitmap without finding a corresponding reverse index), we'll return -1 but leave the 'midx' field non-NULL. So when we fallback to loading a pack bitmap, we'll complain that the bitmap we're trying to populate already is "opened", even though it isn't. Rectify this by setting the '->pack' and '->midx' field back to NULL as appropriate. Two tests are added: one to ensure that the MIDX-to-pack bitmap fallback works, and another to ensure we still complain when there are multiple pack bitmaps in a repository. Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-25 22:41:20 +00:00
test_expect_success 'graceful fallback when missing reverse index' '
rm -fr repo &&
git init repo &&
test_when_finished "rm -fr repo" &&
(
cd repo &&
test_commit base &&
# write a pack and MIDX bitmap containing base
git repack -adb &&
git multi-pack-index write --bitmap &&
GIT_TEST_MIDX_READ_RIDX=0 \
git rev-list --use-bitmap-index HEAD 2>err &&
! grep "ignoring extra bitmap file" err
)
'
test_done