git/t/perf/p5303-many-packs.sh
Jeff King 67bb65de5d packfile: actually set approximate_object_count_valid
The approximate_object_count() function tries to compute the count only
once per process. But ever since it was introduced in 8e3f52d778
(find_unique_abbrev: move logic out of get_short_sha1(), 2016-10-03), we
failed to actually set the "valid" flag, meaning we'd compute it fresh
on every call.

This turns out not to be _too_ bad, because we're only iterating through
the packed_git list, and not making any system calls. But since it may
get called for every abbreviated hash we output, even this can add up if
you have many packs.

Here are before-and-after timings for a new perf test which just asks
rev-list to abbreviate each commit hash (the test repo is linux.git,
with commit-graphs):

  Test                            origin              HEAD
  ----------------------------------------------------------------------------
  5303.3: rev-list (1)            28.91(28.46+0.44)   29.03(28.65+0.38) +0.4%
  5303.4: abbrev-commit (1)       1.18(1.06+0.11)     1.17(1.02+0.14) -0.8%
  5303.7: rev-list (50)           28.95(28.56+0.38)   29.50(29.17+0.32) +1.9%
  5303.8: abbrev-commit (50)      3.67(3.56+0.10)     3.57(3.42+0.15) -2.7%
  5303.11: rev-list (1000)        30.34(29.89+0.43)   30.82(30.35+0.46) +1.6%
  5303.12: abbrev-commit (1000)   86.82(86.52+0.29)   77.82(77.59+0.22) -10.4%
  5303.15: load 10,000 packs      0.08(0.02+0.05)     0.08(0.02+0.06) +0.0%

It doesn't help at all when we have 1 pack (5303.4), but we get a 10%
speedup when there are 1000 packs (5303.12). That's a modest speedup for
a case that's already slow and we'd hope to avoid in general (note how
slow it is even after, because we have to look in each of those packs
for abbreviations). But it's a one-line change that clearly matches the
original intent, so it seems worth doing.

The included perf test may also be useful for keeping an eye on any
regressions in the overall abbreviation code.

Reported-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-17 11:36:14 -07:00

110 lines
3 KiB
Bash
Executable file

#!/bin/sh
test_description='performance with large numbers of packs'
. ./perf-lib.sh
test_perf_large_repo
# A real many-pack situation would probably come from having a lot of pushes
# over time. We don't know how big each push would be, but we can fake it by
# just walking the first-parent chain and having every 5 commits be their own
# "push". This isn't _entirely_ accurate, as real pushes would have some
# duplicate objects due to thin-pack fixing, but it's a reasonable
# approximation.
#
# And then all of the rest of the objects can go in a single packfile that
# represents the state before any of those pushes (actually, we'll generate
# that first because in such a setup it would be the oldest pack, and we sort
# the packs by reverse mtime inside git).
repack_into_n () {
rm -rf staging &&
mkdir staging &&
git rev-list --first-parent HEAD |
sed -n '1~5p' |
head -n "$1" |
perl -e 'print reverse <>' \
>pushes
# create base packfile
head -n 1 pushes |
git pack-objects --delta-base-offset --revs staging/pack
# and then incrementals between each pair of commits
last= &&
while read rev
do
if test -n "$last"; then
{
echo "$rev" &&
echo "^$last"
} |
git pack-objects --delta-base-offset --revs \
staging/pack || return 1
fi
last=$rev
done <pushes &&
# and install the whole thing
rm -f .git/objects/pack/* &&
mv staging/* .git/objects/pack/
}
# Pretend we just have a single branch and no reflogs, and that everything is
# in objects/pack; that makes our fake pack-building via repack_into_n()
# much simpler.
test_expect_success 'simplify reachability' '
tip=$(git rev-parse --verify HEAD) &&
git for-each-ref --format="option no-deref%0adelete %(refname)" |
git update-ref --stdin &&
rm -rf .git/logs &&
git update-ref refs/heads/master $tip &&
git symbolic-ref HEAD refs/heads/master &&
git repack -ad
'
for nr_packs in 1 50 1000
do
test_expect_success "create $nr_packs-pack scenario" '
repack_into_n $nr_packs
'
test_perf "rev-list ($nr_packs)" '
git rev-list --objects --all >/dev/null
'
test_perf "abbrev-commit ($nr_packs)" '
git rev-list --abbrev-commit HEAD >/dev/null
'
# This simulates the interesting part of the repack, which is the
# actual pack generation, without smudging the on-disk setup
# between trials.
test_perf "repack ($nr_packs)" '
GIT_TEST_FULL_IN_PACK_ARRAY=1 \
git pack-objects --keep-true-parents \
--honor-pack-keep --non-empty --all \
--reflog --indexed-objects --delta-base-offset \
--stdout </dev/null >/dev/null
'
done
# Measure pack loading with 10,000 packs.
test_expect_success 'generate lots of packs' '
for i in $(test_seq 10000); do
echo "blob"
echo "data <<EOF"
echo "blob $i"
echo "EOF"
echo "checkpoint"
done |
git -c fastimport.unpackLimit=0 fast-import
'
# The purpose of this test is to evaluate load time for a large number
# of packs while doing as little other work as possible.
test_perf "load 10,000 packs" '
git rev-parse --verify "HEAD^{commit}"
'
test_done