2013-12-21 14:00:38 +00:00
|
|
|
#!/bin/sh
|
|
|
|
|
|
|
|
test_description='exercise basic bitmap functionality'
|
tests: mark tests relying on the current default for `init.defaultBranch`
In addition to the manual adjustment to let the `linux-gcc` CI job run
the test suite with `master` and then with `main`, this patch makes sure
that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts
that currently rely on the initial branch name being `master by default.
To determine which test scripts to mark up, the first step was to
force-set the default branch name to `master` in
- all test scripts that contain the keyword `master`,
- t4211, which expects `t/t4211/history.export` with a hard-coded ref to
initialize the default branch,
- t5560 because it sources `t/t556x_common` which uses `master`,
- t8002 and t8012 because both source `t/annotate-tests.sh` which also
uses `master`)
This trick was performed by this command:
$ sed -i '/^ *\. \.\/\(test-lib\|lib-\(bash\|cvs\|git-svn\)\|gitweb-lib\)\.sh$/i\
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\
' $(git grep -l master t/t[0-9]*.sh) \
t/t4211*.sh t/t5560*.sh t/t8002*.sh t/t8012*.sh
After that, careful, manual inspection revealed that some of the test
scripts containing the needle `master` do not actually rely on a
specific default branch name: either they mention `master` only in a
comment, or they initialize that branch specificially, or they do not
actually refer to the current default branch. Therefore, the
aforementioned modification was undone in those test scripts thusly:
$ git checkout HEAD -- \
t/t0027-auto-crlf.sh t/t0060-path-utils.sh \
t/t1011-read-tree-sparse-checkout.sh \
t/t1305-config-include.sh t/t1309-early-config.sh \
t/t1402-check-ref-format.sh t/t1450-fsck.sh \
t/t2024-checkout-dwim.sh \
t/t2106-update-index-assume-unchanged.sh \
t/t3040-subprojects-basic.sh t/t3301-notes.sh \
t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \
t/t3436-rebase-more-options.sh \
t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \
t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \
t/t5511-refspec.sh t/t5526-fetch-submodules.sh \
t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \
t/t5548-push-porcelain.sh \
t/t5552-skipping-fetch-negotiator.sh \
t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \
t/t5614-clone-submodules-shallow.sh \
t/t7508-status.sh t/t7606-merge-custom.sh \
t/t9302-fast-import-unpack-limit.sh
We excluded one set of test scripts in these commands, though: the range
of `git p4` tests. The reason? `git p4` stores the (foreign) remote
branch in the branch called `p4/master`, which is obviously not the
default branch. Manual analysis revealed that only five of these tests
actually require a specific default branch name to pass; They were
modified thusly:
$ sed -i '/^ *\. \.\/lib-git-p4\.sh$/i\
GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\
' t/t980[0167]*.sh t/t9811*.sh
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-18 23:44:19 +00:00
|
|
|
|
2013-12-21 14:00:38 +00:00
|
|
|
. ./test-lib.sh
|
2021-02-09 21:41:54 +00:00
|
|
|
. "$TEST_DIRECTORY"/lib-bitmap.sh
|
2013-12-21 14:00:38 +00:00
|
|
|
|
2021-08-31 20:52:36 +00:00
|
|
|
# t5310 deals only with single-pack bitmaps, so don't write MIDX bitmaps in
|
|
|
|
# their place.
|
|
|
|
GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
|
|
|
|
|
pack-bitmap.c: use commit boundary during bitmap traversal
When reachability bitmap coverage exists in a repository, Git will use a
different (and hopefully faster) traversal to compute revision walks.
Consider a set of positive and negative tips (which we'll refer to with
their standard bitmap parlance by "wants", and "haves"). In order to
figure out what objects exist between the tips, the existing traversal
in `prepare_bitmap_walk()` does something like:
1. Consider if we can even compute the set of objects with bitmaps,
and fall back to the usual traversal if we cannot. For example,
pathspec limiting traversals can't be computed using bitmaps (since
they don't know which objects are at which paths). The same is true
of certain kinds of non-trivial object filters.
2. If we can compute the traversal with bitmaps, partition the
(dereferenced) tips into two object lists, "haves", and "wants",
based on whether or not the objects have the UNINTERESTING flag,
respectively.
3. Fall back to the ordinary object traversal if either (a) there are
more than zero haves, none of which are in the bitmapped pack or
MIDX, or (b) there are no wants.
4. Construct a reachability bitmap for the "haves" side by walking
from the revision tips down to any existing bitmaps, OR-ing in any
bitmaps as they are found.
5. Then do the same for the "wants" side, stopping at any objects that
appear in the "haves" bitmap.
6. Filter the results if any object filter (that can be easily
computed with bitmaps alone) was given, and then return back to the
caller.
When there is good bitmap coverage relative to the traversal tips, this
walk is often significantly faster than an ordinary object traversal
because it can visit far fewer objects.
But in certain cases, it can be significantly *slower* than the usual
object traversal. Why? Because we need to compute complete bitmaps on
either side of the walk. If either one (or both) of the sides require
walking many (or all!) objects before they get to an existing bitmap,
the extra bitmap machinery is mostly or all overhead.
One of the benefits, however, is that even if the walk is slower, bitmap
traversals are guaranteed to provide an *exact* answer. Unlike the
traditional object traversal algorithm, which can over-count the results
by not opening trees for older commits, the bitmap walk builds an exact
reachability bitmap for either side, meaning the results are never
over-counted.
But producing non-exact results is OK for our traversal here (both in
the bitmap case and not), as long as the results are over-counted, not
under.
Relaxing the bitmap traversal to allow it to produce over-counted
results gives us the opportunity to make some significant improvements.
Instead of the above, the new algorithm only has to walk from the
*boundary* down to the nearest bitmap, instead of from each of the
UNINTERESTING tips.
The boundary-based approach still has degenerate cases, but we'll show
in a moment that it is often a significant improvement.
The new algorithm works as follows:
1. Build a (partial) bitmap of the haves side by first OR-ing any
bitmap(s) that already exist for UNINTERESTING commits between the
haves and the boundary.
2. For each commit along the boundary, add it as a fill-in traversal
tip (where the traversal terminates once an existing bitmap is
found), and perform fill-in traversal.
3. Build up a complete bitmap of the wants side as usual, stopping any
time we intersect the (partial) haves side.
4. Return the results.
And is more-or-less equivalent to using the *old* algorithm with this
invocation:
$ git rev-list --objects --use-bitmap-index $WANTS --not \
$(git rev-list --objects --boundary $WANTS --not $HAVES |
perl -lne 'print $1 if /^-(.*)/')
The new result performs significantly better in many cases, particularly
when the distance from the boundary commit(s) to an existing bitmap is
shorter than the distance from (all of) the have tips to the nearest
bitmapped commit.
Note that when using the old bitmap traversal algorithm, the results can
be *slower* than without bitmaps! Under the new algorithm, the result is
computed faster with bitmaps than without (at the cost of over-counting
the true number of objects in a similar fashion as the non-bitmap
traversal):
# (Computing the number of tagged objects not on any branches
# without bitmaps).
$ time git rev-list --count --objects --tags --not --branches
20
real 0m1.388s
user 0m1.092s
sys 0m0.296s
# (Computing the same query using the old bitmap traversal).
$ time git rev-list --count --objects --tags --not --branches --use-bitmap-index
19
real 0m22.709s
user 0m21.628s
sys 0m1.076s
# (this commit)
$ time git.compile rev-list --count --objects --tags --not --branches --use-bitmap-index
19
real 0m1.518s
user 0m1.234s
sys 0m0.284s
The new algorithm is still slower than not using bitmaps at all, but it
is nearly a 15-fold improvement over the existing traversal.
In a more realistic setting (using my local copy of git.git), I can
observe a similar (if more modest) speed-up:
$ argv="--count --objects --branches --not --tags"
hyperfine \
-n 'no bitmaps' "git.compile rev-list $argv" \
-n 'existing traversal' "git.compile rev-list --use-bitmap-index $argv" \
-n 'boundary traversal' "git.compile -c pack.useBitmapBoundaryTraversal=true rev-list --use-bitmap-index $argv"
Benchmark 1: no bitmaps
Time (mean ± σ): 124.6 ms ± 2.1 ms [User: 103.7 ms, System: 20.8 ms]
Range (min … max): 122.6 ms … 133.1 ms 22 runs
Benchmark 2: existing traversal
Time (mean ± σ): 368.6 ms ± 3.0 ms [User: 325.3 ms, System: 43.1 ms]
Range (min … max): 365.1 ms … 374.8 ms 10 runs
Benchmark 3: boundary traversal
Time (mean ± σ): 167.6 ms ± 0.9 ms [User: 139.5 ms, System: 27.9 ms]
Range (min … max): 166.1 ms … 169.2 ms 17 runs
Summary
'no bitmaps' ran
1.34 ± 0.02 times faster than 'boundary traversal'
2.96 ± 0.05 times faster than 'existing traversal'
Here, the new algorithm is also still slower than not using bitmaps, but
represents a more than 2-fold improvement over the existing traversal in
a more modest example.
Since this algorithm was originally written (nearly a year and a half
ago, at the time of writing), the bitmap lookup table shipped, making
the new algorithm's result more competitive. A few other future
directions for improving bitmap traversal times beyond not using bitmaps
at all:
- Decrease the cost to decompress and OR together many bitmaps
together (particularly when enumerating the uninteresting side of
the walk). Here we could explore more efficient bitmap storage
techniques, like Roaring+Run and/or use SIMD instructions to speed
up ORing them together.
- Store pseudo-merge bitmaps, which could allow us to OR together
fewer "summary" bitmaps (which would also help with the above).
Helped-by: Jeff King <peff@peff.net>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-08 17:38:12 +00:00
|
|
|
# Likewise, allow individual tests to control whether or not they use
|
|
|
|
# the boundary-based traversal.
|
|
|
|
sane_unset GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL
|
|
|
|
|
add `ignore_missing_links` mode to revwalk
When pack-objects is computing the reachability bitmap to
serve a fetch request, it can erroneously die() if some of
the UNINTERESTING objects are not present. Upload-pack
throws away HAVE lines from the client for objects we do not
have, but we may have a tip object without all of its
ancestors (e.g., if the tip is no longer reachable and was
new enough to survive a `git prune`, but some of its
reachable objects did get pruned).
In the non-bitmap case, we do a revision walk with the HAVE
objects marked as UNINTERESTING. The revision walker
explicitly ignores errors in accessing UNINTERESTING commits
to handle this case (and we do not bother looking at
UNINTERESTING trees or blobs at all).
When we have bitmaps, however, the process is quite
different. The bitmap index for a pack-objects run is
calculated in two separate steps:
First, we perform an extensive walk from all the HAVEs to
find the full set of objects reachable from them. This walk
is usually optimized away because we are expected to hit an
object with a bitmap during the traversal, which allows us
to terminate early.
Secondly, we perform an extensive walk from all the WANTs,
which usually also terminates early because we hit a commit
with an existing bitmap.
Once we have the resulting bitmaps from the two walks, we
AND-NOT them together to obtain the resulting set of objects
we need to pack.
When we are walking the HAVE objects, the revision walker
does not know that we are walking it only to mark the
results as uninteresting. We strip out the UNINTERESTING flag,
because those objects _are_ interesting to us during the
first walk. We want to keep going to get a complete set of
reachable objects if we can.
We need some way to tell the revision walker that it's OK to
silently truncate the HAVE walk, just like it does for the
UNINTERESTING case. This patch introduces a new
`ignore_missing_links` flag to the `rev_info` struct, which
we set only for the HAVE walk.
It also adds tests to cover UNINTERESTING objects missing
from several positions: a missing blob, a missing tree, and
a missing parent commit. The missing blob already worked (as
we do not care about its contents at all), but the other two
cases caused us to die().
Note that there are a few cases we do not need to test:
1. We do not need to test a missing tree, with the blob
still present. Without the tree that refers to it, we
would not know that the blob is relevant to our walk.
2. We do not need to test a tip commit that is missing.
Upload-pack omits these for us (and in fact, we
complain even in the non-bitmap case if it fails to do
so).
Reported-by: Siddharth Agarwal <sid0@fb.com>
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-28 10:00:43 +00:00
|
|
|
objpath () {
|
|
|
|
echo ".git/objects/$(echo "$1" | sed -e 's|\(..\)|\1/|')"
|
|
|
|
}
|
|
|
|
|
pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use
Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there
are two codepaths in pack-objects: with & without using bitmap
reachability index.
However add_object_entry_from_bitmap(), despite its non-bitmapped
counterpart add_object_entry(), in no way does check for whether --local
or --honor-pack-keep or --incremental should be respected. In
non-bitmapped codepath this is handled in want_object_in_pack(), but
bitmapped codepath has simply no such checking at all.
The bitmapped codepath however was allowing to pass in all those options
and with bitmap indices still being used under such conditions -
potentially giving wrong output (e.g. including objects from non-local or
.keep'ed pack).
We can easily fix this by noting the following: when an object comes to
add_object_entry_from_bitmap() it can come for two reasons:
1. entries coming from main pack covered by bitmap index, and
2. object coming from, possibly alternate, loose or other packs.
"2" can be already handled by want_object_in_pack() and to cover
"1" we can teach want_object_in_pack() to expect that *found_pack can be
non-NULL, meaning calling client already found object's pack entry.
In want_object_in_pack() we care to start the checks from already found
pack, if we have one, this way determining the answer right away
in case neither --local nor --honour-pack-keep are active. In
particular, as p5310-pack-bitmaps.sh shows (3 consecutive runs), we do
not do harm to served-with-bitmap clones performance-wise:
Test 56dfeb62 this tree
-----------------------------------------------------------------
5310.2: repack to disk 9.08(8.20+0.25) 9.09(8.14+0.32) +0.1%
5310.3: simulated clone 1.92(2.12+0.08) 1.93(2.12+0.09) +0.5%
5310.4: simulated fetch 0.82(1.07+0.04) 0.82(1.06+0.04) +0.0%
5310.6: partial bitmap 1.96(2.42+0.13) 1.95(2.40+0.15) -0.5%
Test 56dfeb62 this tree
-----------------------------------------------------------------
5310.2: repack to disk 9.11(8.16+0.32) 9.11(8.19+0.28) +0.0%
5310.3: simulated clone 1.93(2.14+0.07) 1.92(2.11+0.10) -0.5%
5310.4: simulated fetch 0.82(1.06+0.04) 0.82(1.04+0.05) +0.0%
5310.6: partial bitmap 1.95(2.38+0.16) 1.94(2.39+0.14) -0.5%
Test 56dfeb62 this tree
-----------------------------------------------------------------
5310.2: repack to disk 9.13(8.17+0.31) 9.07(8.13+0.28) -0.7%
5310.3: simulated clone 1.92(2.13+0.07) 1.91(2.12+0.06) -0.5%
5310.4: simulated fetch 0.82(1.08+0.03) 0.82(1.08+0.03) +0.0%
5310.6: partial bitmap 1.96(2.43+0.14) 1.96(2.42+0.14) +0.0%
with delta timings showing they are all within noise from run to run.
In the general case we do not want to call find_pack_entry_one() more than
once, because it is expensive. This patch splits the loop in
want_object_in_pack() into two parts: finding the object and seeing if it
impacts our choice to include it in the pack. We may call the inexpensive
want_found_object() twice, but we will never call find_pack_entry_one() if we
do not need to.
I appreciate help and discussing this change with Junio C Hamano and
Jeff King.
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-10 15:01:10 +00:00
|
|
|
# show objects present in pack ($1 should be associated *.idx)
|
|
|
|
list_packed_objects () {
|
t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' test
The test 'pack-objects to file can use bitmap' added in 645c432d61
(pack-objects: use reachability bitmap index when generating
non-stdout pack, 2016-09-10) is silently buggy and doesn't check what
it's supposed to.
In 't5310-pack-bitmaps.sh', the 'list_packed_objects' helper function
does what its name implies by running:
git show-index <"$1" | cut -d' ' -f2
The test in question invokes this function like this:
list_packed_objects <packa-$packasha1.idx >packa.objects &&
list_packed_objects <packb-$packbsha1.idx >packb.objects &&
test_cmp packa.objects packb.objects
Note how these two callsites don't specify the name of the pack index
file as the function's parameter, but redirect the function's standard
input from it. This triggers an error message from the shell, as it
has no filename to redirect from in the function, but this error is
ignored, because it happens upstream of a pipe. Consequently, both
invocations produce empty 'pack{a,b}.objects' files, and the
subsequent 'test_cmp' happily finds those two empty files identical.
Fix these two 'list_packed_objects' invocations by specifying the pack
index files as parameters. Furthermore, eliminate the pipe in that
function by replacing it with an &&-chained pair of commands using an
intermediate file, so a failure of 'git show-index' or the shell
redirection will fail the test.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-14 11:47:21 +00:00
|
|
|
git show-index <"$1" >object-list &&
|
|
|
|
cut -d' ' -f2 object-list
|
pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use
Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there
are two codepaths in pack-objects: with & without using bitmap
reachability index.
However add_object_entry_from_bitmap(), despite its non-bitmapped
counterpart add_object_entry(), in no way does check for whether --local
or --honor-pack-keep or --incremental should be respected. In
non-bitmapped codepath this is handled in want_object_in_pack(), but
bitmapped codepath has simply no such checking at all.
The bitmapped codepath however was allowing to pass in all those options
and with bitmap indices still being used under such conditions -
potentially giving wrong output (e.g. including objects from non-local or
.keep'ed pack).
We can easily fix this by noting the following: when an object comes to
add_object_entry_from_bitmap() it can come for two reasons:
1. entries coming from main pack covered by bitmap index, and
2. object coming from, possibly alternate, loose or other packs.
"2" can be already handled by want_object_in_pack() and to cover
"1" we can teach want_object_in_pack() to expect that *found_pack can be
non-NULL, meaning calling client already found object's pack entry.
In want_object_in_pack() we care to start the checks from already found
pack, if we have one, this way determining the answer right away
in case neither --local nor --honour-pack-keep are active. In
particular, as p5310-pack-bitmaps.sh shows (3 consecutive runs), we do
not do harm to served-with-bitmap clones performance-wise:
Test 56dfeb62 this tree
-----------------------------------------------------------------
5310.2: repack to disk 9.08(8.20+0.25) 9.09(8.14+0.32) +0.1%
5310.3: simulated clone 1.92(2.12+0.08) 1.93(2.12+0.09) +0.5%
5310.4: simulated fetch 0.82(1.07+0.04) 0.82(1.06+0.04) +0.0%
5310.6: partial bitmap 1.96(2.42+0.13) 1.95(2.40+0.15) -0.5%
Test 56dfeb62 this tree
-----------------------------------------------------------------
5310.2: repack to disk 9.11(8.16+0.32) 9.11(8.19+0.28) +0.0%
5310.3: simulated clone 1.93(2.14+0.07) 1.92(2.11+0.10) -0.5%
5310.4: simulated fetch 0.82(1.06+0.04) 0.82(1.04+0.05) +0.0%
5310.6: partial bitmap 1.95(2.38+0.16) 1.94(2.39+0.14) -0.5%
Test 56dfeb62 this tree
-----------------------------------------------------------------
5310.2: repack to disk 9.13(8.17+0.31) 9.07(8.13+0.28) -0.7%
5310.3: simulated clone 1.92(2.13+0.07) 1.91(2.12+0.06) -0.5%
5310.4: simulated fetch 0.82(1.08+0.03) 0.82(1.08+0.03) +0.0%
5310.6: partial bitmap 1.96(2.43+0.14) 1.96(2.42+0.14) +0.0%
with delta timings showing they are all within noise from run to run.
In the general case we do not want to call find_pack_entry_one() more than
once, because it is expensive. This patch splits the loop in
want_object_in_pack() into two parts: finding the object and seeing if it
impacts our choice to include it in the pack. We may call the inexpensive
want_found_object() twice, but we will never call find_pack_entry_one() if we
do not need to.
I appreciate help and discussing this change with Junio C Hamano and
Jeff King.
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-10 15:01:10 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
# has_any pattern-file content-file
|
|
|
|
# tests whether content-file has any entry from pattern-file with entries being
|
|
|
|
# whole lines.
|
|
|
|
has_any () {
|
|
|
|
grep -Ff "$1" "$2"
|
|
|
|
}
|
|
|
|
|
2022-08-14 16:55:09 +00:00
|
|
|
test_bitmap_cases () {
|
|
|
|
writeLookupTable=false
|
|
|
|
for i in "$@"
|
|
|
|
do
|
|
|
|
case "$i" in
|
|
|
|
"pack.writeBitmapLookupTable") writeLookupTable=true;;
|
|
|
|
esac
|
|
|
|
done
|
|
|
|
|
|
|
|
test_expect_success 'setup test repository' '
|
|
|
|
rm -fr * .git &&
|
|
|
|
git init &&
|
|
|
|
git config pack.writeBitmapLookupTable '"$writeLookupTable"'
|
|
|
|
'
|
|
|
|
setup_bitmap_history
|
|
|
|
|
|
|
|
test_expect_success 'setup writing bitmaps during repack' '
|
|
|
|
git config repack.writeBitmaps true
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'full repack creates bitmaps' '
|
|
|
|
GIT_TRACE2_EVENT="$(pwd)/trace" \
|
|
|
|
git repack -ad &&
|
|
|
|
ls .git/objects/pack/ | grep bitmap >output &&
|
|
|
|
test_line_count = 1 output &&
|
|
|
|
grep "\"key\":\"num_selected_commits\",\"value\":\"106\"" trace &&
|
|
|
|
grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
|
|
|
|
'
|
|
|
|
|
|
|
|
basic_bitmap_tests
|
|
|
|
|
|
|
|
test_expect_success 'pack-objects respects --local (non-local loose)' '
|
|
|
|
git init --bare alt.git &&
|
|
|
|
echo $(pwd)/alt.git/objects >.git/objects/info/alternates &&
|
|
|
|
echo content1 >file1 &&
|
|
|
|
# non-local loose object which is not present in bitmapped pack
|
|
|
|
altblob=$(GIT_DIR=alt.git git hash-object -w file1) &&
|
|
|
|
# non-local loose object which is also present in bitmapped pack
|
|
|
|
git cat-file blob $blob | GIT_DIR=alt.git git hash-object -w --stdin &&
|
|
|
|
git add file1 &&
|
|
|
|
test_tick &&
|
|
|
|
git commit -m commit_file1 &&
|
|
|
|
echo HEAD | git pack-objects --local --stdout --revs >1.pack &&
|
|
|
|
git index-pack 1.pack &&
|
|
|
|
list_packed_objects 1.idx >1.objects &&
|
|
|
|
printf "%s\n" "$altblob" "$blob" >nonlocal-loose &&
|
|
|
|
! has_any nonlocal-loose 1.objects
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'pack-objects respects --honor-pack-keep (local non-bitmapped pack)' '
|
|
|
|
echo content2 >file2 &&
|
|
|
|
blob2=$(git hash-object -w file2) &&
|
|
|
|
git add file2 &&
|
|
|
|
test_tick &&
|
|
|
|
git commit -m commit_file2 &&
|
|
|
|
printf "%s\n" "$blob2" "$bitmaptip" >keepobjects &&
|
|
|
|
pack2=$(git pack-objects pack2 <keepobjects) &&
|
|
|
|
mv pack2-$pack2.* .git/objects/pack/ &&
|
|
|
|
>.git/objects/pack/pack2-$pack2.keep &&
|
|
|
|
rm $(objpath $blob2) &&
|
|
|
|
echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >2a.pack &&
|
|
|
|
git index-pack 2a.pack &&
|
|
|
|
list_packed_objects 2a.idx >2a.objects &&
|
|
|
|
! has_any keepobjects 2a.objects
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'pack-objects respects --local (non-local pack)' '
|
|
|
|
mv .git/objects/pack/pack2-$pack2.* alt.git/objects/pack/ &&
|
|
|
|
echo HEAD | git pack-objects --local --stdout --revs >2b.pack &&
|
|
|
|
git index-pack 2b.pack &&
|
|
|
|
list_packed_objects 2b.idx >2b.objects &&
|
|
|
|
! has_any keepobjects 2b.objects
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'pack-objects respects --honor-pack-keep (local bitmapped pack)' '
|
|
|
|
ls .git/objects/pack/ | grep bitmap >output &&
|
|
|
|
test_line_count = 1 output &&
|
|
|
|
packbitmap=$(basename $(cat output) .bitmap) &&
|
|
|
|
list_packed_objects .git/objects/pack/$packbitmap.idx >packbitmap.objects &&
|
|
|
|
test_when_finished "rm -f .git/objects/pack/$packbitmap.keep" &&
|
|
|
|
>.git/objects/pack/$packbitmap.keep &&
|
|
|
|
echo HEAD | git pack-objects --honor-pack-keep --stdout --revs >3a.pack &&
|
|
|
|
git index-pack 3a.pack &&
|
|
|
|
list_packed_objects 3a.idx >3a.objects &&
|
|
|
|
! has_any packbitmap.objects 3a.objects
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'pack-objects respects --local (non-local bitmapped pack)' '
|
|
|
|
mv .git/objects/pack/$packbitmap.* alt.git/objects/pack/ &&
|
|
|
|
rm -f .git/objects/pack/multi-pack-index &&
|
|
|
|
test_when_finished "mv alt.git/objects/pack/$packbitmap.* .git/objects/pack/" &&
|
|
|
|
echo HEAD | git pack-objects --local --stdout --revs >3b.pack &&
|
|
|
|
git index-pack 3b.pack &&
|
|
|
|
list_packed_objects 3b.idx >3b.objects &&
|
|
|
|
! has_any packbitmap.objects 3b.objects
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'pack-objects to file can use bitmap' '
|
|
|
|
# make sure we still have 1 bitmap index from previous tests
|
|
|
|
ls .git/objects/pack/ | grep bitmap >output &&
|
|
|
|
test_line_count = 1 output &&
|
|
|
|
# verify equivalent packs are generated with/without using bitmap index
|
|
|
|
packasha1=$(git pack-objects --no-use-bitmap-index --all packa </dev/null) &&
|
|
|
|
packbsha1=$(git pack-objects --use-bitmap-index --all packb </dev/null) &&
|
|
|
|
list_packed_objects packa-$packasha1.idx >packa.objects &&
|
|
|
|
list_packed_objects packb-$packbsha1.idx >packb.objects &&
|
|
|
|
test_cmp packa.objects packb.objects
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'full repack, reusing previous bitmaps' '
|
pack-bitmap-write: build fewer intermediate bitmaps
The bitmap_writer_build() method calls bitmap_builder_init() to
construct a list of commits reachable from the selected commits along
with a "reverse graph". This reverse graph has edges pointing from a
commit to other commits that can reach that commit. After computing a
reachability bitmap for a commit, the values in that bitmap are then
copied to the reachability bitmaps across the edges in the reverse
graph.
We can now relax the role of the reverse graph to greatly reduce the
number of intermediate reachability bitmaps we compute during this
reverse walk. The end result is that we walk objects the same number of
times as before when constructing the reachability bitmaps, but we also
spend much less time copying bits between bitmaps and have much lower
memory pressure in the process.
The core idea is to select a set of "important" commits based on
interactions among the sets of commits reachable from each selected commit.
The first technical concept is to create a new 'commit_mask' member in the
bb_commit struct. Note that the selected commits are provided in an
ordered array. The first thing to do is to mark the ith bit in the
commit_mask for the ith selected commit. As we walk the commit-graph, we
copy the bits in a commit's commit_mask to its parents. At the end of
the walk, the ith bit in the commit_mask for a commit C stores a boolean
representing "The ith selected commit can reach C."
As we walk, we will discover non-selected commits that are important. We
will get into this later, but those important commits must also receive
bit positions, growing the width of the bitmasks as we walk. At the true
end of the walk, the ith bit means "the ith _important_ commit can reach
C."
MAXIMAL COMMITS
---------------
We use a new 'maximal' bit in the bb_commit struct to represent whether
a commit is important or not. The term "maximal" comes from the
partially-ordered set of commits in the commit-graph where C >= P if P
is a parent of C, and then extending the relationship transitively.
Instead of taking the maximal commits across the entire commit-graph, we
instead focus on selecting each commit that is maximal among commits
with the same bits on in their commit_mask. This definition is
important, so let's consider an example.
Suppose we have three selected commits A, B, and C. These are assigned
bitmasks 100, 010, and 001 to start. Each of these can be marked as
maximal immediately because they each will be the uniquely maximal
commit that contains their own bit. Keep in mind that that these commits
may have different bitmasks after the walk; for example, if B can reach
C but A cannot, then the final bitmask for C is 011. Even in these
cases, C would still be a maximal commit among all commits with the
third bit on in their masks.
Now define sets X, Y, and Z to be the sets of commits reachable from A,
B, and C, respectively. The intersections of these sets correspond to
different bitmasks:
* 100: X - (Y union Z)
* 010: Y - (X union Z)
* 001: Z - (X union Y)
* 110: (X intersect Y) - Z
* 101: (X intersect Z) - Y
* 011: (Y intersect Z) - X
* 111: X intersect Y intersect Z
This can be visualized with the following Hasse diagram:
100 010 001
| \ / \ / |
| \/ \/ |
| /\ /\ |
| / \ / \ |
110 101 011
\___ | ___/
\ | /
111
Some of these bitmasks may not be represented, depending on the topology
of the commit-graph. In fact, we are counting on it, since the number of
possible bitmasks is exponential in the number of selected commits, but
is also limited by the total number of commits. In practice, very few
bitmasks are possible because most commits converge on a common "trunk"
in the commit history.
With this three-bit example, we wish to find commits that are maximal
for each bitmask. How can we identify this as we are walking?
As we walk, we visit a commit C. Since we are walking the commits in
topo-order, we know that C is visited after all of its children are
visited. Thus, when we get C from the revision walk we inspect the
'maximal' property of its bb_data and use that to determine if C is truly
important. Its commit_mask is also nearly final. If C is not one of the
originally-selected commits, then assign a bit position to C (by
incrementing num_maximal) and set that bit on in commit_mask. See
"MULTIPLE MAXIMAL COMMITS" below for more detail on this.
Now that the commit C is known to be maximal or not, consider each
parent P of C. Compute two new values:
* c_not_p : true if and only if the commit_mask for C contains a bit
that is not contained in the commit_mask for P.
* p_not_c : true if and only if the commit_mask for P contains a bit
that is not contained in the commit_mask for P.
If c_not_p is false, then P already has all of the bits that C would
provide to its commit_mask. In this case, move on to other parents as C
has nothing to contribute to P's state that was not already provided by
other children of P.
We continue with the case that c_not_p is true. This means there are
bits in C's commit_mask to copy to P's commit_mask, so use bitmap_or()
to add those bits.
If p_not_c is also true, then set the maximal bit for P to one. This means
that if no other commit has P as a parent, then P is definitely maximal.
This is because no child had the same bitmask. It is important to think
about the maximal bit for P at this point as a temporary state: "P is
maximal based on current information."
In contrast, if p_not_c is false, then set the maximal bit for P to
zero. Further, clear all reverse_edges for P since any edges that were
previously assigned to P are no longer important. P will gain all
reverse edges based on C.
The final thing we need to do is to update the reverse edges for P.
These reverse edges respresent "which closest maximal commits
contributed bits to my commit_mask?" Since C contributed bits to P's
commit_mask in this case, C must add to the reverse edges of P.
If C is maximal, then C is a 'closest' maximal commit that contributed
bits to P. Add C to P's reverse_edges list.
Otherwise, C has a list of maximal commits that contributed bits to its
bitmask (and this list is exactly one element). Add all of these items
to P's reverse_edges list. Be careful to ignore duplicates here.
After inspecting all parents P for a commit C, we can clear the
commit_mask for C. This reduces the memory load to be limited to the
"width" of the commit graph.
Consider our ABC/XYZ example from earlier and let's inspect the state of
the commits for an interesting bitmask, say 011. Suppose that D is the
only maximal commit with this bitmask (in the first three bits). All
other commits with bitmask 011 have D as the only entry in their
reverse_edges list. D's reverse_edges list contains B and C.
COMPUTING REACHABILITY BITMAPS
------------------------------
Now that we have our definition, let's zoom out and consider what
happens with our new reverse graph when computing reachability bitmaps.
We walk the reverse graph in reverse-topo-order, so we visit commits
with largest commit_masks first. After we compute the reachability
bitmap for a commit C, we push the bits in that bitmap to each commit D
in the reverse edge list for C. Then, when we finally visit D we already
have the bits for everything reachable from maximal commits that D can
reach and we only need to walk the objects in the set-difference.
In our ABC/XYZ example, when we finally walk for the commit A we only
need to walk commits with bitmask equal to A's bitmask. If that bitmask
is 100, then we are only walking commits in X - (Y union Z) because the
bitmap already contains the bits for objects reachable from (X intersect
Y) union (X intersect Z) (i.e. the bits from the reachability bitmaps
for the maximal commits with bitmasks 110 and 101).
The behavior is intended to walk each commit (and the trees that commit
introduces) at most once while allocating and copying fewer reachability
bitmaps. There is one caveat: what happens when there are multiple
maximal commits with the same bitmask, with respect to the initial set
of selected commits?
MULTIPLE MAXIMAL COMMITS
------------------------
Earlier, we mentioned that when we discover a new maximal commit, we
assign a new bit position to that commit and set that bit position to
one for that commit. This is absolutely important for interesting
commit-graphs such as git/git and torvalds/linux. The reason is due to
the existence of "butterflies" in the commit-graph partial order.
Here is an example of four commits forming a butterfly:
I J
|\ /|
| \/ |
| /\ |
|/ \|
M N
\ /
|/
Q
Here, I and J both have parents M and N. In general, these do not need
to be exact parent relationships, but reachability relationships. The
most important part is that M and N cannot reach each other, so they are
independent in the partial order. If I had commit_mask 10 and J had
commit_mask 01, then M and N would both be assigned commit_mask 11 and
be maximal commits with the bitmask 11. Then, what happens when M and N
can both reach a commit Q? If Q is also assigned the bitmask 11, then it
is not maximal but is reachable from both M and N.
While this is not necessarily a deal-breaker for our abstract definition
of finding maximal commits according to a given bitmask, we have a few
issues that can come up in our larger picture of constructing
reachability bitmaps.
In particular, if we do not also consider Q to be a "maximal" commit,
then we will walk commits reachable from Q twice: once when computing
the reachability bitmap for M and another time when computing the
reachability bitmap for N. This becomes much worse if the topology
continues this pattern with multiple butterflies.
The solution has already been mentioned: each of M and N are assigned
their own bits to the bitmask and hence they become uniquely maximal for
their bitmasks. Finally, Q also becomes maximal and thus we do not need
to walk its commits multiple times. The final bitmasks for these commits
are as follows:
I:10 J:01
|\ /|
| \ _____/ |
| /\____ |
|/ \ |
M:111 N:1101
\ /
Q:1111
Further, Q's reverse edge list is { M, N }, while M and N both have
reverse edge list { I, J }.
PERFORMANCE MEASUREMENTS
------------------------
Now that we've spent a LOT of time on the theory of this algorithm,
let's show that this is actually worth all that effort.
To test the performance, use GIT_TRACE2_PERF=1 when running
'git repack -abd' in a repository with no existing reachability bitmaps.
This avoids any issues with keeping existing bitmaps to skew the
numbers.
Inspect the "building_bitmaps_total" region in the trace2 output to
focus on the portion of work that is affected by this change. Here are
the performance comparisons for a few repositories. The timings are for
the following versions of Git: "multi" is the timing from before any
reverse graph is constructed, where we might perform multiple
traversals. "reverse" is for the previous change where the reverse graph
has every reachable commit. Finally "maximal" is the version introduced
here where the reverse graph only contains the maximal commits.
Repository: git/git
multi: 2.628 sec
reverse: 2.344 sec
maximal: 2.047 sec
Repository: torvalds/linux
multi: 64.7 sec
reverse: 205.3 sec
maximal: 44.7 sec
So in all cases we've not only recovered any time lost to switching to
the reverse-edge algorithm, but we come out ahead of "multi" in all
cases. Likewise, peak heap has gone back to something reasonable:
Repository: torvalds/linux
multi: 2.087 GB
reverse: 3.141 GB
maximal: 2.288 GB
While I do not have access to full fork networks on GitHub, Peff has run
this algorithm on the chromium/chromium fork network and reported a
change from 3 hours to ~233 seconds. That network is particularly
beneficial for this approach because it has a long, linear history along
with many tags. The "multi" approach was obviously quadratic and the new
approach is linear.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08 22:04:30 +00:00
|
|
|
git repack -ad &&
|
2022-08-14 16:55:09 +00:00
|
|
|
ls .git/objects/pack/ | grep bitmap >output &&
|
|
|
|
test_line_count = 1 output
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'fetch (full bitmap)' '
|
|
|
|
git --git-dir=clone.git fetch origin second:second &&
|
|
|
|
git rev-parse HEAD >expect &&
|
|
|
|
git --git-dir=clone.git rev-parse HEAD >actual &&
|
|
|
|
test_cmp expect actual
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'create objects for missing-HAVE tests' '
|
|
|
|
blob=$(echo "missing have" | git hash-object -w --stdin) &&
|
|
|
|
tree=$(printf "100644 blob $blob\tfile\n" | git mktree) &&
|
|
|
|
parent=$(echo parent | git commit-tree $tree) &&
|
|
|
|
commit=$(echo commit | git commit-tree $tree -p $parent) &&
|
|
|
|
cat >revs <<-EOF
|
|
|
|
HEAD
|
|
|
|
^HEAD^
|
|
|
|
^$commit
|
|
|
|
EOF
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'pack-objects respects --incremental' '
|
|
|
|
cat >revs2 <<-EOF &&
|
|
|
|
HEAD
|
|
|
|
$commit
|
|
|
|
EOF
|
|
|
|
git pack-objects --incremental --stdout --revs <revs2 >4.pack &&
|
|
|
|
git index-pack 4.pack &&
|
|
|
|
list_packed_objects 4.idx >4.objects &&
|
|
|
|
test_line_count = 4 4.objects &&
|
|
|
|
git rev-list --objects $commit >revlist &&
|
|
|
|
cut -d" " -f1 revlist |sort >objects &&
|
|
|
|
test_cmp 4.objects objects
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'pack with missing blob' '
|
|
|
|
rm $(objpath $blob) &&
|
|
|
|
git pack-objects --stdout --revs <revs >/dev/null
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'pack with missing tree' '
|
|
|
|
rm $(objpath $tree) &&
|
|
|
|
git pack-objects --stdout --revs <revs >/dev/null
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'pack with missing parent' '
|
|
|
|
rm $(objpath $parent) &&
|
|
|
|
git pack-objects --stdout --revs <revs >/dev/null
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success JGIT,SHA1 'we can read jgit bitmaps' '
|
|
|
|
git clone --bare . compat-jgit.git &&
|
|
|
|
(
|
|
|
|
cd compat-jgit.git &&
|
|
|
|
rm -f objects/pack/*.bitmap &&
|
|
|
|
jgit gc &&
|
|
|
|
git rev-list --test-bitmap HEAD
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success JGIT,SHA1 'jgit can read our bitmaps' '
|
|
|
|
git clone --bare . compat-us.git &&
|
|
|
|
(
|
|
|
|
cd compat-us.git &&
|
|
|
|
git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
|
|
|
|
git repack -adb &&
|
|
|
|
# jgit gc will barf if it does not like our bitmaps
|
|
|
|
jgit gc
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'splitting packs does not generate bogus bitmaps' '
|
|
|
|
test-tool genrandom foo $((1024 * 1024)) >rand &&
|
|
|
|
git add rand &&
|
|
|
|
git commit -m "commit with big file" &&
|
|
|
|
git -c pack.packSizeLimit=500k repack -adb &&
|
|
|
|
git init --bare no-bitmaps.git &&
|
|
|
|
git -C no-bitmaps.git fetch .. HEAD
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'set up reusable pack' '
|
|
|
|
rm -f .git/objects/pack/*.keep &&
|
|
|
|
git repack -adb &&
|
|
|
|
reusable_pack () {
|
|
|
|
git for-each-ref --format="%(objectname)" |
|
|
|
|
git pack-objects --delta-base-offset --revs --stdout "$@"
|
|
|
|
}
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'pack reuse respects --honor-pack-keep' '
|
|
|
|
test_when_finished "rm -f .git/objects/pack/*.keep" &&
|
|
|
|
for i in .git/objects/pack/*.pack
|
|
|
|
do
|
|
|
|
>${i%.pack}.keep || return 1
|
|
|
|
done &&
|
|
|
|
reusable_pack --honor-pack-keep >empty.pack &&
|
|
|
|
git index-pack empty.pack &&
|
|
|
|
git show-index <empty.idx >actual &&
|
|
|
|
test_must_be_empty actual
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'pack reuse respects --local' '
|
|
|
|
mv .git/objects/pack/* alt.git/objects/pack/ &&
|
|
|
|
test_when_finished "mv alt.git/objects/pack/* .git/objects/pack/" &&
|
|
|
|
reusable_pack --local >empty.pack &&
|
|
|
|
git index-pack empty.pack &&
|
|
|
|
git show-index <empty.idx >actual &&
|
|
|
|
test_must_be_empty actual
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'pack reuse respects --incremental' '
|
|
|
|
reusable_pack --incremental >empty.pack &&
|
|
|
|
git index-pack empty.pack &&
|
|
|
|
git show-index <empty.idx >actual &&
|
|
|
|
test_must_be_empty actual
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'truncated bitmap fails gracefully (ewah)' '
|
|
|
|
test_config pack.writebitmaphashcache false &&
|
2022-08-14 16:55:10 +00:00
|
|
|
test_config pack.writebitmaplookuptable false &&
|
2022-08-14 16:55:09 +00:00
|
|
|
git repack -ad &&
|
|
|
|
git rev-list --use-bitmap-index --count --all >expect &&
|
|
|
|
bitmap=$(ls .git/objects/pack/*.bitmap) &&
|
|
|
|
test_when_finished "rm -f $bitmap" &&
|
|
|
|
test_copy_bytes 256 <$bitmap >$bitmap.tmp &&
|
|
|
|
mv -f $bitmap.tmp $bitmap &&
|
|
|
|
git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
|
|
|
|
test_cmp expect actual &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep corrupt.ewah.bitmap stderr
|
2022-08-14 16:55:09 +00:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'truncated bitmap fails gracefully (cache)' '
|
2022-08-14 16:55:10 +00:00
|
|
|
git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
|
2022-08-14 16:55:09 +00:00
|
|
|
git repack -ad &&
|
|
|
|
git rev-list --use-bitmap-index --count --all >expect &&
|
|
|
|
bitmap=$(ls .git/objects/pack/*.bitmap) &&
|
|
|
|
test_when_finished "rm -f $bitmap" &&
|
|
|
|
test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
|
|
|
|
mv -f $bitmap.tmp $bitmap &&
|
|
|
|
git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
|
|
|
|
test_cmp expect actual &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep corrupted.bitmap.index stderr
|
2022-08-14 16:55:09 +00:00
|
|
|
'
|
|
|
|
|
|
|
|
# Create a state of history with these properties:
|
|
|
|
#
|
|
|
|
# - refs that allow a client to fetch some new history, while sharing some old
|
|
|
|
# history with the server; we use branches delta-reuse-old and
|
|
|
|
# delta-reuse-new here
|
|
|
|
#
|
|
|
|
# - the new history contains an object that is stored on the server as a delta
|
|
|
|
# against a base that is in the old history
|
|
|
|
#
|
|
|
|
# - the base object is not immediately reachable from the tip of the old
|
|
|
|
# history; finding it would involve digging down through history we know the
|
|
|
|
# other side has
|
|
|
|
#
|
|
|
|
# This should result in a state where fetching from old->new would not
|
|
|
|
# traditionally reuse the on-disk delta (because we'd have to dig to realize
|
|
|
|
# that the client has it), but we will do so if bitmaps can tell us cheaply
|
|
|
|
# that the other side has it.
|
|
|
|
test_expect_success 'set up thin delta-reuse parent' '
|
|
|
|
# This first commit contains the buried base object.
|
|
|
|
test-tool genrandom delta 16384 >file &&
|
|
|
|
git add file &&
|
|
|
|
git commit -m "delta base" &&
|
|
|
|
base=$(git rev-parse --verify HEAD:file) &&
|
|
|
|
|
|
|
|
# These intermediate commits bury the base back in history.
|
|
|
|
# This becomes the "old" state.
|
|
|
|
for i in 1 2 3 4 5
|
|
|
|
do
|
|
|
|
echo $i >file &&
|
|
|
|
git commit -am "intermediate $i" || return 1
|
|
|
|
done &&
|
|
|
|
git branch delta-reuse-old &&
|
|
|
|
|
|
|
|
# And now our new history has a delta against the buried base. Note
|
|
|
|
# that this must be smaller than the original file, since pack-objects
|
|
|
|
# prefers to create deltas from smaller objects to larger.
|
|
|
|
test-tool genrandom delta 16300 >file &&
|
|
|
|
git commit -am "delta result" &&
|
|
|
|
delta=$(git rev-parse --verify HEAD:file) &&
|
|
|
|
git branch delta-reuse-new &&
|
|
|
|
|
|
|
|
# Repack with bitmaps and double check that we have the expected delta
|
|
|
|
# relationship.
|
|
|
|
git repack -adb &&
|
|
|
|
have_delta $delta $base
|
|
|
|
'
|
|
|
|
|
|
|
|
# Now we can sanity-check the non-bitmap behavior (that the server is not able
|
|
|
|
# to reuse the delta). This isn't strictly something we care about, so this
|
|
|
|
# test could be scrapped in the future. But it makes sure that the next test is
|
|
|
|
# actually triggering the feature we want.
|
|
|
|
#
|
|
|
|
# Note that our tools for working with on-the-wire "thin" packs are limited. So
|
|
|
|
# we actually perform the fetch, retain the resulting pack, and inspect the
|
|
|
|
# result.
|
|
|
|
test_expect_success 'fetch without bitmaps ignores delta against old base' '
|
|
|
|
test_config pack.usebitmaps false &&
|
|
|
|
test_when_finished "rm -rf client.git" &&
|
|
|
|
git init --bare client.git &&
|
|
|
|
(
|
|
|
|
cd client.git &&
|
|
|
|
git config transfer.unpackLimit 1 &&
|
|
|
|
git fetch .. delta-reuse-old:delta-reuse-old &&
|
|
|
|
git fetch .. delta-reuse-new:delta-reuse-new &&
|
|
|
|
have_delta $delta $ZERO_OID
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
|
|
|
# And do the same for the bitmap case, where we do expect to find the delta.
|
|
|
|
test_expect_success 'fetch with bitmaps can reuse old base' '
|
|
|
|
test_config pack.usebitmaps true &&
|
|
|
|
test_when_finished "rm -rf client.git" &&
|
|
|
|
git init --bare client.git &&
|
|
|
|
(
|
|
|
|
cd client.git &&
|
|
|
|
git config transfer.unpackLimit 1 &&
|
|
|
|
git fetch .. delta-reuse-old:delta-reuse-old &&
|
|
|
|
git fetch .. delta-reuse-new:delta-reuse-new &&
|
|
|
|
have_delta $delta $base
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'pack.preferBitmapTips' '
|
|
|
|
git init repo &&
|
|
|
|
test_when_finished "rm -fr repo" &&
|
|
|
|
(
|
|
|
|
cd repo &&
|
|
|
|
git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
|
|
|
|
|
|
|
|
# create enough commits that not all are receive bitmap
|
|
|
|
# coverage even if they are all at the tip of some reference.
|
|
|
|
test_commit_bulk --message="%s" 103 &&
|
|
|
|
|
|
|
|
git rev-list HEAD >commits.raw &&
|
|
|
|
sort <commits.raw >commits &&
|
|
|
|
|
|
|
|
git log --format="create refs/tags/%s %H" HEAD >refs &&
|
|
|
|
git update-ref --stdin <refs &&
|
|
|
|
|
|
|
|
git repack -adb &&
|
|
|
|
test-tool bitmap list-commits | sort >bitmaps &&
|
|
|
|
|
|
|
|
# remember which commits did not receive bitmaps
|
|
|
|
comm -13 bitmaps commits >before &&
|
|
|
|
test_file_not_empty before &&
|
|
|
|
|
|
|
|
# mark the commits which did not receive bitmaps as preferred,
|
|
|
|
# and generate the bitmap again
|
|
|
|
perl -pe "s{^}{create refs/tags/include/$. }" <before |
|
|
|
|
git update-ref --stdin &&
|
|
|
|
git -c pack.preferBitmapTips=refs/tags/include repack -adb &&
|
|
|
|
|
|
|
|
# finally, check that the commit(s) without bitmap coverage
|
|
|
|
# are not the same ones as before
|
|
|
|
test-tool bitmap list-commits | sort >bitmaps &&
|
|
|
|
comm -13 bitmaps commits >after &&
|
|
|
|
|
|
|
|
! test_cmp before after
|
|
|
|
)
|
|
|
|
'
|
|
|
|
|
config API: add "string" version of *_value_multi(), fix segfaults
Fix numerous and mostly long-standing segfaults in consumers of
the *_config_*value_multi() API. As discussed in the preceding commit
an empty key in the config syntax yields a "NULL" string, which these
users would give to strcmp() (or similar), resulting in segfaults.
As this change shows, most users users of the *_config_*value_multi()
API didn't really want such an an unsafe and low-level API, let's give
them something with the safety of git_config_get_string() instead.
This fix is similar to what the *_string() functions and others
acquired in[1] and [2]. Namely introducing and using a safer
"*_get_string_multi()" variant of the low-level "_*value_multi()"
function.
This fixes segfaults in code introduced in:
- d811c8e17c6 (versionsort: support reorder prerelease suffixes, 2015-02-26)
- c026557a373 (versioncmp: generalize version sort suffix reordering, 2016-12-08)
- a086f921a72 (submodule: decouple url and submodule interest, 2017-03-17)
- a6be5e6764a (log: add log.excludeDecoration config option, 2020-04-16)
- 92156291ca8 (log: add default decoration filter, 2022-08-05)
- 50a044f1e40 (gc: replace config subprocesses with API calls, 2022-09-27)
There are now two users ofthe low-level API:
- One in "builtin/for-each-repo.c", which we'll convert in a
subsequent commit.
- The "t/helper/test-config.c" code added in [3].
As seen in the preceding commit we need to give the
"t/helper/test-config.c" caller these "NULL" entries.
We could also alter the underlying git_configset_get_value_multi()
function to be "string safe", but doing so would leave no room for
other variants of "*_get_value_multi()" that coerce to other types.
Such coercion can't be built on the string version, since as we've
established "NULL" is a true value in the boolean context, but if we
coerced it to "" for use in a list of strings it'll be subsequently
coerced to "false" as a boolean.
The callback pattern being used here will make it easy to introduce
e.g. a "multi" variant which coerces its values to "bool", "int",
"path" etc.
1. 40ea4ed9032 (Add config_error_nonbool() helper function,
2008-02-11)
2. 6c47d0e8f39 (config.c: guard config parser from value=NULL,
2008-02-11).
3. 4c715ebb96a (test-config: add tests for the config_set API,
2014-07-28)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 14:04:27 +00:00
|
|
|
test_expect_success 'pack.preferBitmapTips' '
|
2023-03-28 14:04:26 +00:00
|
|
|
git init repo &&
|
|
|
|
test_when_finished "rm -rf repo" &&
|
|
|
|
(
|
|
|
|
cd repo &&
|
|
|
|
git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
|
|
|
|
test_commit_bulk --message="%s" 103 &&
|
|
|
|
|
|
|
|
cat >>.git/config <<-\EOF &&
|
|
|
|
[pack]
|
|
|
|
preferBitmapTips
|
|
|
|
EOF
|
config API: add "string" version of *_value_multi(), fix segfaults
Fix numerous and mostly long-standing segfaults in consumers of
the *_config_*value_multi() API. As discussed in the preceding commit
an empty key in the config syntax yields a "NULL" string, which these
users would give to strcmp() (or similar), resulting in segfaults.
As this change shows, most users users of the *_config_*value_multi()
API didn't really want such an an unsafe and low-level API, let's give
them something with the safety of git_config_get_string() instead.
This fix is similar to what the *_string() functions and others
acquired in[1] and [2]. Namely introducing and using a safer
"*_get_string_multi()" variant of the low-level "_*value_multi()"
function.
This fixes segfaults in code introduced in:
- d811c8e17c6 (versionsort: support reorder prerelease suffixes, 2015-02-26)
- c026557a373 (versioncmp: generalize version sort suffix reordering, 2016-12-08)
- a086f921a72 (submodule: decouple url and submodule interest, 2017-03-17)
- a6be5e6764a (log: add log.excludeDecoration config option, 2020-04-16)
- 92156291ca8 (log: add default decoration filter, 2022-08-05)
- 50a044f1e40 (gc: replace config subprocesses with API calls, 2022-09-27)
There are now two users ofthe low-level API:
- One in "builtin/for-each-repo.c", which we'll convert in a
subsequent commit.
- The "t/helper/test-config.c" code added in [3].
As seen in the preceding commit we need to give the
"t/helper/test-config.c" caller these "NULL" entries.
We could also alter the underlying git_configset_get_value_multi()
function to be "string safe", but doing so would leave no room for
other variants of "*_get_value_multi()" that coerce to other types.
Such coercion can't be built on the string version, since as we've
established "NULL" is a true value in the boolean context, but if we
coerced it to "" for use in a list of strings it'll be subsequently
coerced to "false" as a boolean.
The callback pattern being used here will make it easy to introduce
e.g. a "multi" variant which coerces its values to "bool", "int",
"path" etc.
1. 40ea4ed9032 (Add config_error_nonbool() helper function,
2008-02-11)
2. 6c47d0e8f39 (config.c: guard config parser from value=NULL,
2008-02-11).
3. 4c715ebb96a (test-config: add tests for the config_set API,
2014-07-28)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 14:04:27 +00:00
|
|
|
cat >expect <<-\EOF &&
|
|
|
|
error: missing value for '\''pack.preferbitmaptips'\''
|
|
|
|
EOF
|
|
|
|
git repack -adb 2>actual &&
|
|
|
|
test_cmp expect actual
|
2023-03-28 14:04:26 +00:00
|
|
|
)
|
|
|
|
'
|
|
|
|
|
2022-08-14 16:55:09 +00:00
|
|
|
test_expect_success 'complains about multiple pack bitmaps' '
|
|
|
|
rm -fr repo &&
|
|
|
|
git init repo &&
|
|
|
|
test_when_finished "rm -fr repo" &&
|
|
|
|
(
|
|
|
|
cd repo &&
|
|
|
|
git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
|
|
|
|
|
|
|
|
test_commit base &&
|
|
|
|
|
|
|
|
git repack -adb &&
|
|
|
|
bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
|
|
|
|
mv "$bitmap" "$bitmap.bak" &&
|
|
|
|
|
|
|
|
test_commit other &&
|
|
|
|
git repack -ab &&
|
|
|
|
|
|
|
|
mv "$bitmap.bak" "$bitmap" &&
|
|
|
|
|
|
|
|
find .git/objects/pack -type f -name "*.pack" >packs &&
|
|
|
|
find .git/objects/pack -type f -name "*.bitmap" >bitmaps &&
|
|
|
|
test_line_count = 2 packs &&
|
|
|
|
test_line_count = 2 bitmaps &&
|
|
|
|
|
2022-11-10 07:10:12 +00:00
|
|
|
GIT_TRACE2_EVENT=$(pwd)/trace2.txt git rev-list --use-bitmap-index HEAD &&
|
|
|
|
grep "opened bitmap" trace2.txt &&
|
|
|
|
grep "ignoring extra bitmap" trace2.txt
|
2022-08-14 16:55:09 +00:00
|
|
|
)
|
|
|
|
'
|
|
|
|
}
|
2013-12-21 14:00:38 +00:00
|
|
|
|
2022-08-14 16:55:09 +00:00
|
|
|
test_bitmap_cases
|
2013-12-21 14:00:38 +00:00
|
|
|
|
pack-bitmap.c: use commit boundary during bitmap traversal
When reachability bitmap coverage exists in a repository, Git will use a
different (and hopefully faster) traversal to compute revision walks.
Consider a set of positive and negative tips (which we'll refer to with
their standard bitmap parlance by "wants", and "haves"). In order to
figure out what objects exist between the tips, the existing traversal
in `prepare_bitmap_walk()` does something like:
1. Consider if we can even compute the set of objects with bitmaps,
and fall back to the usual traversal if we cannot. For example,
pathspec limiting traversals can't be computed using bitmaps (since
they don't know which objects are at which paths). The same is true
of certain kinds of non-trivial object filters.
2. If we can compute the traversal with bitmaps, partition the
(dereferenced) tips into two object lists, "haves", and "wants",
based on whether or not the objects have the UNINTERESTING flag,
respectively.
3. Fall back to the ordinary object traversal if either (a) there are
more than zero haves, none of which are in the bitmapped pack or
MIDX, or (b) there are no wants.
4. Construct a reachability bitmap for the "haves" side by walking
from the revision tips down to any existing bitmaps, OR-ing in any
bitmaps as they are found.
5. Then do the same for the "wants" side, stopping at any objects that
appear in the "haves" bitmap.
6. Filter the results if any object filter (that can be easily
computed with bitmaps alone) was given, and then return back to the
caller.
When there is good bitmap coverage relative to the traversal tips, this
walk is often significantly faster than an ordinary object traversal
because it can visit far fewer objects.
But in certain cases, it can be significantly *slower* than the usual
object traversal. Why? Because we need to compute complete bitmaps on
either side of the walk. If either one (or both) of the sides require
walking many (or all!) objects before they get to an existing bitmap,
the extra bitmap machinery is mostly or all overhead.
One of the benefits, however, is that even if the walk is slower, bitmap
traversals are guaranteed to provide an *exact* answer. Unlike the
traditional object traversal algorithm, which can over-count the results
by not opening trees for older commits, the bitmap walk builds an exact
reachability bitmap for either side, meaning the results are never
over-counted.
But producing non-exact results is OK for our traversal here (both in
the bitmap case and not), as long as the results are over-counted, not
under.
Relaxing the bitmap traversal to allow it to produce over-counted
results gives us the opportunity to make some significant improvements.
Instead of the above, the new algorithm only has to walk from the
*boundary* down to the nearest bitmap, instead of from each of the
UNINTERESTING tips.
The boundary-based approach still has degenerate cases, but we'll show
in a moment that it is often a significant improvement.
The new algorithm works as follows:
1. Build a (partial) bitmap of the haves side by first OR-ing any
bitmap(s) that already exist for UNINTERESTING commits between the
haves and the boundary.
2. For each commit along the boundary, add it as a fill-in traversal
tip (where the traversal terminates once an existing bitmap is
found), and perform fill-in traversal.
3. Build up a complete bitmap of the wants side as usual, stopping any
time we intersect the (partial) haves side.
4. Return the results.
And is more-or-less equivalent to using the *old* algorithm with this
invocation:
$ git rev-list --objects --use-bitmap-index $WANTS --not \
$(git rev-list --objects --boundary $WANTS --not $HAVES |
perl -lne 'print $1 if /^-(.*)/')
The new result performs significantly better in many cases, particularly
when the distance from the boundary commit(s) to an existing bitmap is
shorter than the distance from (all of) the have tips to the nearest
bitmapped commit.
Note that when using the old bitmap traversal algorithm, the results can
be *slower* than without bitmaps! Under the new algorithm, the result is
computed faster with bitmaps than without (at the cost of over-counting
the true number of objects in a similar fashion as the non-bitmap
traversal):
# (Computing the number of tagged objects not on any branches
# without bitmaps).
$ time git rev-list --count --objects --tags --not --branches
20
real 0m1.388s
user 0m1.092s
sys 0m0.296s
# (Computing the same query using the old bitmap traversal).
$ time git rev-list --count --objects --tags --not --branches --use-bitmap-index
19
real 0m22.709s
user 0m21.628s
sys 0m1.076s
# (this commit)
$ time git.compile rev-list --count --objects --tags --not --branches --use-bitmap-index
19
real 0m1.518s
user 0m1.234s
sys 0m0.284s
The new algorithm is still slower than not using bitmaps at all, but it
is nearly a 15-fold improvement over the existing traversal.
In a more realistic setting (using my local copy of git.git), I can
observe a similar (if more modest) speed-up:
$ argv="--count --objects --branches --not --tags"
hyperfine \
-n 'no bitmaps' "git.compile rev-list $argv" \
-n 'existing traversal' "git.compile rev-list --use-bitmap-index $argv" \
-n 'boundary traversal' "git.compile -c pack.useBitmapBoundaryTraversal=true rev-list --use-bitmap-index $argv"
Benchmark 1: no bitmaps
Time (mean ± σ): 124.6 ms ± 2.1 ms [User: 103.7 ms, System: 20.8 ms]
Range (min … max): 122.6 ms … 133.1 ms 22 runs
Benchmark 2: existing traversal
Time (mean ± σ): 368.6 ms ± 3.0 ms [User: 325.3 ms, System: 43.1 ms]
Range (min … max): 365.1 ms … 374.8 ms 10 runs
Benchmark 3: boundary traversal
Time (mean ± σ): 167.6 ms ± 0.9 ms [User: 139.5 ms, System: 27.9 ms]
Range (min … max): 166.1 ms … 169.2 ms 17 runs
Summary
'no bitmaps' ran
1.34 ± 0.02 times faster than 'boundary traversal'
2.96 ± 0.05 times faster than 'existing traversal'
Here, the new algorithm is also still slower than not using bitmaps, but
represents a more than 2-fold improvement over the existing traversal in
a more modest example.
Since this algorithm was originally written (nearly a year and a half
ago, at the time of writing), the bitmap lookup table shipped, making
the new algorithm's result more competitive. A few other future
directions for improving bitmap traversal times beyond not using bitmaps
at all:
- Decrease the cost to decompress and OR together many bitmaps
together (particularly when enumerating the uninteresting side of
the walk). Here we could explore more efficient bitmap storage
techniques, like Roaring+Run and/or use SIMD instructions to speed
up ORing them together.
- Store pseudo-merge bitmaps, which could allow us to OR together
fewer "summary" bitmaps (which would also help with the above).
Helped-by: Jeff King <peff@peff.net>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-08 17:38:12 +00:00
|
|
|
GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL=1
|
|
|
|
export GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL
|
|
|
|
|
|
|
|
test_bitmap_cases
|
|
|
|
|
|
|
|
sane_unset GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL
|
|
|
|
|
2016-12-28 22:45:42 +00:00
|
|
|
test_expect_success 'incremental repack fails when bitmaps are requested' '
|
2013-12-21 14:00:38 +00:00
|
|
|
test_commit more-1 &&
|
2016-12-28 22:45:42 +00:00
|
|
|
test_must_fail git repack -d 2>err &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep "Incremental repacks are incompatible with bitmap" err
|
2013-12-21 14:00:38 +00:00
|
|
|
'
|
|
|
|
|
|
|
|
test_expect_success 'incremental repack can disable bitmaps' '
|
|
|
|
test_commit more-2 &&
|
|
|
|
git repack -d --no-write-bitmap-index
|
|
|
|
'
|
|
|
|
|
pack-bitmap.c: use commit boundary during bitmap traversal
When reachability bitmap coverage exists in a repository, Git will use a
different (and hopefully faster) traversal to compute revision walks.
Consider a set of positive and negative tips (which we'll refer to with
their standard bitmap parlance by "wants", and "haves"). In order to
figure out what objects exist between the tips, the existing traversal
in `prepare_bitmap_walk()` does something like:
1. Consider if we can even compute the set of objects with bitmaps,
and fall back to the usual traversal if we cannot. For example,
pathspec limiting traversals can't be computed using bitmaps (since
they don't know which objects are at which paths). The same is true
of certain kinds of non-trivial object filters.
2. If we can compute the traversal with bitmaps, partition the
(dereferenced) tips into two object lists, "haves", and "wants",
based on whether or not the objects have the UNINTERESTING flag,
respectively.
3. Fall back to the ordinary object traversal if either (a) there are
more than zero haves, none of which are in the bitmapped pack or
MIDX, or (b) there are no wants.
4. Construct a reachability bitmap for the "haves" side by walking
from the revision tips down to any existing bitmaps, OR-ing in any
bitmaps as they are found.
5. Then do the same for the "wants" side, stopping at any objects that
appear in the "haves" bitmap.
6. Filter the results if any object filter (that can be easily
computed with bitmaps alone) was given, and then return back to the
caller.
When there is good bitmap coverage relative to the traversal tips, this
walk is often significantly faster than an ordinary object traversal
because it can visit far fewer objects.
But in certain cases, it can be significantly *slower* than the usual
object traversal. Why? Because we need to compute complete bitmaps on
either side of the walk. If either one (or both) of the sides require
walking many (or all!) objects before they get to an existing bitmap,
the extra bitmap machinery is mostly or all overhead.
One of the benefits, however, is that even if the walk is slower, bitmap
traversals are guaranteed to provide an *exact* answer. Unlike the
traditional object traversal algorithm, which can over-count the results
by not opening trees for older commits, the bitmap walk builds an exact
reachability bitmap for either side, meaning the results are never
over-counted.
But producing non-exact results is OK for our traversal here (both in
the bitmap case and not), as long as the results are over-counted, not
under.
Relaxing the bitmap traversal to allow it to produce over-counted
results gives us the opportunity to make some significant improvements.
Instead of the above, the new algorithm only has to walk from the
*boundary* down to the nearest bitmap, instead of from each of the
UNINTERESTING tips.
The boundary-based approach still has degenerate cases, but we'll show
in a moment that it is often a significant improvement.
The new algorithm works as follows:
1. Build a (partial) bitmap of the haves side by first OR-ing any
bitmap(s) that already exist for UNINTERESTING commits between the
haves and the boundary.
2. For each commit along the boundary, add it as a fill-in traversal
tip (where the traversal terminates once an existing bitmap is
found), and perform fill-in traversal.
3. Build up a complete bitmap of the wants side as usual, stopping any
time we intersect the (partial) haves side.
4. Return the results.
And is more-or-less equivalent to using the *old* algorithm with this
invocation:
$ git rev-list --objects --use-bitmap-index $WANTS --not \
$(git rev-list --objects --boundary $WANTS --not $HAVES |
perl -lne 'print $1 if /^-(.*)/')
The new result performs significantly better in many cases, particularly
when the distance from the boundary commit(s) to an existing bitmap is
shorter than the distance from (all of) the have tips to the nearest
bitmapped commit.
Note that when using the old bitmap traversal algorithm, the results can
be *slower* than without bitmaps! Under the new algorithm, the result is
computed faster with bitmaps than without (at the cost of over-counting
the true number of objects in a similar fashion as the non-bitmap
traversal):
# (Computing the number of tagged objects not on any branches
# without bitmaps).
$ time git rev-list --count --objects --tags --not --branches
20
real 0m1.388s
user 0m1.092s
sys 0m0.296s
# (Computing the same query using the old bitmap traversal).
$ time git rev-list --count --objects --tags --not --branches --use-bitmap-index
19
real 0m22.709s
user 0m21.628s
sys 0m1.076s
# (this commit)
$ time git.compile rev-list --count --objects --tags --not --branches --use-bitmap-index
19
real 0m1.518s
user 0m1.234s
sys 0m0.284s
The new algorithm is still slower than not using bitmaps at all, but it
is nearly a 15-fold improvement over the existing traversal.
In a more realistic setting (using my local copy of git.git), I can
observe a similar (if more modest) speed-up:
$ argv="--count --objects --branches --not --tags"
hyperfine \
-n 'no bitmaps' "git.compile rev-list $argv" \
-n 'existing traversal' "git.compile rev-list --use-bitmap-index $argv" \
-n 'boundary traversal' "git.compile -c pack.useBitmapBoundaryTraversal=true rev-list --use-bitmap-index $argv"
Benchmark 1: no bitmaps
Time (mean ± σ): 124.6 ms ± 2.1 ms [User: 103.7 ms, System: 20.8 ms]
Range (min … max): 122.6 ms … 133.1 ms 22 runs
Benchmark 2: existing traversal
Time (mean ± σ): 368.6 ms ± 3.0 ms [User: 325.3 ms, System: 43.1 ms]
Range (min … max): 365.1 ms … 374.8 ms 10 runs
Benchmark 3: boundary traversal
Time (mean ± σ): 167.6 ms ± 0.9 ms [User: 139.5 ms, System: 27.9 ms]
Range (min … max): 166.1 ms … 169.2 ms 17 runs
Summary
'no bitmaps' ran
1.34 ± 0.02 times faster than 'boundary traversal'
2.96 ± 0.05 times faster than 'existing traversal'
Here, the new algorithm is also still slower than not using bitmaps, but
represents a more than 2-fold improvement over the existing traversal in
a more modest example.
Since this algorithm was originally written (nearly a year and a half
ago, at the time of writing), the bitmap lookup table shipped, making
the new algorithm's result more competitive. A few other future
directions for improving bitmap traversal times beyond not using bitmaps
at all:
- Decrease the cost to decompress and OR together many bitmaps
together (particularly when enumerating the uninteresting side of
the walk). Here we could explore more efficient bitmap storage
techniques, like Roaring+Run and/or use SIMD instructions to speed
up ORing them together.
- Store pseudo-merge bitmaps, which could allow us to OR together
fewer "summary" bitmaps (which would also help with the above).
Helped-by: Jeff King <peff@peff.net>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-08 17:38:12 +00:00
|
|
|
test_expect_success 'boundary-based traversal is used when requested' '
|
|
|
|
git repack -a -d --write-bitmap-index &&
|
|
|
|
|
|
|
|
for argv in \
|
|
|
|
"git -c pack.useBitmapBoundaryTraversal=true" \
|
|
|
|
"git -c feature.experimental=true" \
|
|
|
|
"GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL=1 git"
|
|
|
|
do
|
|
|
|
eval "GIT_TRACE2_EVENT=1 $argv rev-list --objects \
|
|
|
|
--use-bitmap-index second..other 2>perf" &&
|
|
|
|
grep "\"region_enter\".*\"label\":\"haves/boundary\"" perf ||
|
|
|
|
return 1
|
|
|
|
done &&
|
|
|
|
|
|
|
|
for argv in \
|
|
|
|
"git -c pack.useBitmapBoundaryTraversal=false" \
|
|
|
|
"git -c feature.experimental=true -c pack.useBitmapBoundaryTraversal=false" \
|
|
|
|
"GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL=0 git -c pack.useBitmapBoundaryTraversal=true" \
|
|
|
|
"GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL=0 git -c feature.experimental=true"
|
|
|
|
do
|
|
|
|
eval "GIT_TRACE2_EVENT=1 $argv rev-list --objects \
|
|
|
|
--use-bitmap-index second..other 2>perf" &&
|
|
|
|
grep "\"region_enter\".*\"label\":\"haves/classic\"" perf ||
|
|
|
|
return 1
|
|
|
|
done
|
|
|
|
'
|
|
|
|
|
2022-08-14 16:55:09 +00:00
|
|
|
test_bitmap_cases "pack.writeBitmapLookupTable"
|
add `ignore_missing_links` mode to revwalk
When pack-objects is computing the reachability bitmap to
serve a fetch request, it can erroneously die() if some of
the UNINTERESTING objects are not present. Upload-pack
throws away HAVE lines from the client for objects we do not
have, but we may have a tip object without all of its
ancestors (e.g., if the tip is no longer reachable and was
new enough to survive a `git prune`, but some of its
reachable objects did get pruned).
In the non-bitmap case, we do a revision walk with the HAVE
objects marked as UNINTERESTING. The revision walker
explicitly ignores errors in accessing UNINTERESTING commits
to handle this case (and we do not bother looking at
UNINTERESTING trees or blobs at all).
When we have bitmaps, however, the process is quite
different. The bitmap index for a pack-objects run is
calculated in two separate steps:
First, we perform an extensive walk from all the HAVEs to
find the full set of objects reachable from them. This walk
is usually optimized away because we are expected to hit an
object with a bitmap during the traversal, which allows us
to terminate early.
Secondly, we perform an extensive walk from all the WANTs,
which usually also terminates early because we hit a commit
with an existing bitmap.
Once we have the resulting bitmaps from the two walks, we
AND-NOT them together to obtain the resulting set of objects
we need to pack.
When we are walking the HAVE objects, the revision walker
does not know that we are walking it only to mark the
results as uninteresting. We strip out the UNINTERESTING flag,
because those objects _are_ interesting to us during the
first walk. We want to keep going to get a complete set of
reachable objects if we can.
We need some way to tell the revision walker that it's OK to
silently truncate the HAVE walk, just like it does for the
UNINTERESTING case. This patch introduces a new
`ignore_missing_links` flag to the `rev_info` struct, which
we set only for the HAVE walk.
It also adds tests to cover UNINTERESTING objects missing
from several positions: a missing blob, a missing tree, and
a missing parent commit. The missing blob already worked (as
we do not care about its contents at all), but the other two
cases caused us to die().
Note that there are a few cases we do not need to test:
1. We do not need to test a missing tree, with the blob
still present. Without the tree that refers to it, we
would not know that the blob is relevant to our walk.
2. We do not need to test a tip commit that is missing.
Upload-pack omits these for us (and in fact, we
complain even in the non-bitmap case if it fails to do
so).
Reported-by: Siddharth Agarwal <sid0@fb.com>
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-28 10:00:43 +00:00
|
|
|
|
2022-08-14 16:55:09 +00:00
|
|
|
test_expect_success 'verify writing bitmap lookup table when enabled' '
|
|
|
|
GIT_TRACE2_EVENT="$(pwd)/trace2" \
|
|
|
|
git repack -ad &&
|
|
|
|
grep "\"label\":\"writing_lookup_table\"" trace2
|
pack-bitmap.c: gracefully fallback after opening pack/MIDX
When opening a MIDX/pack-bitmap, we call open_midx_bitmap_1() or
open_pack_bitmap_1() respectively in a loop over the set of MIDXs/packs.
By design, these functions are supposed to be called over every pack and
MIDX, since only one of them should have a valid bitmap.
Ordinarily we return '0' from these two functions in order to indicate
that we successfully loaded a bitmap To signal that we couldn't load a
bitmap corresponding to the MIDX/pack (either because one doesn't exist,
or because there was an error with loading it), we can return '-1'. In
either case, the callers each enumerate all MIDXs/packs to ensure that
at most one bitmap per-kind is present.
But when we fail to load a bitmap that does exist (for example, loading
a MIDX bitmap without finding a corresponding reverse index), we'll
return -1 but leave the 'midx' field non-NULL. So when we fallback to
loading a pack bitmap, we'll complain that the bitmap we're trying to
populate already is "opened", even though it isn't.
Rectify this by setting the '->pack' and '->midx' field back to NULL as
appropriate. Two tests are added: one to ensure that the MIDX-to-pack
bitmap fallback works, and another to ensure we still complain when
there are multiple pack bitmaps in a repository.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-25 22:41:20 +00:00
|
|
|
'
|
|
|
|
|
2022-08-14 16:55:10 +00:00
|
|
|
test_expect_success 'truncated bitmap fails gracefully (lookup table)' '
|
|
|
|
test_config pack.writebitmaphashcache false &&
|
|
|
|
git repack -adb &&
|
|
|
|
git rev-list --use-bitmap-index --count --all >expect &&
|
|
|
|
bitmap=$(ls .git/objects/pack/*.bitmap) &&
|
|
|
|
test_when_finished "rm -f $bitmap" &&
|
|
|
|
test_copy_bytes 512 <$bitmap >$bitmap.tmp &&
|
|
|
|
mv -f $bitmap.tmp $bitmap &&
|
|
|
|
git rev-list --use-bitmap-index --count --all >actual 2>stderr &&
|
|
|
|
test_cmp expect actual &&
|
2023-10-31 05:23:30 +00:00
|
|
|
test_grep corrupted.bitmap.index stderr
|
2022-08-14 16:55:10 +00:00
|
|
|
'
|
|
|
|
|
2013-12-21 14:00:38 +00:00
|
|
|
test_done
|