These new performance tests demonstrate effectively the same behavior as
p5310, but use a multi-pack bitmap instead of a single-pack one.
Notably, p5326 does not create a MIDX bitmap with multiple packs. This
is so we can measure a direct comparison between it and p5310. Any
difference between the two is measuring just the overhead of using MIDX
bitmaps.
Here are the results of p5310 and p5326 together, measured at the same
time and on the same machine (using a Xenon W-2255 CPU):
Test HEAD
------------------------------------------------------------------------
5310.2: repack to disk 96.78(93.39+11.33)
5310.3: simulated clone 9.98(9.79+0.19)
5310.4: simulated fetch 1.75(4.26+0.19)
5310.5: pack to file (bitmap) 28.20(27.87+8.70)
5310.6: rev-list (commits) 0.41(0.36+0.05)
5310.7: rev-list (objects) 1.61(1.54+0.07)
5310.8: rev-list count with blob:none 0.25(0.21+0.04)
5310.9: rev-list count with blob:limit=1k 2.65(2.54+0.10)
5310.10: rev-list count with tree:0 0.23(0.19+0.04)
5310.11: simulated partial clone 4.34(4.21+0.12)
5310.13: clone (partial bitmap) 11.05(12.21+0.48)
5310.14: pack to file (partial bitmap) 31.25(34.22+3.70)
5310.15: rev-list with tree filter (partial bitmap) 0.26(0.22+0.04)
versus the same tests (this time using a multi-pack index):
Test HEAD
------------------------------------------------------------------------
5326.2: setup multi-pack index 78.99(75.29+11.58)
5326.3: simulated clone 11.78(11.56+0.22)
5326.4: simulated fetch 1.70(4.49+0.13)
5326.5: pack to file (bitmap) 28.02(27.72+8.76)
5326.6: rev-list (commits) 0.42(0.36+0.06)
5326.7: rev-list (objects) 1.65(1.58+0.06)
5326.8: rev-list count with blob:none 0.26(0.21+0.05)
5326.9: rev-list count with blob:limit=1k 2.97(2.86+0.10)
5326.10: rev-list count with tree:0 0.25(0.20+0.04)
5326.11: simulated partial clone 5.65(5.49+0.16)
5326.13: clone (partial bitmap) 12.22(13.43+0.38)
5326.14: pack to file (partial bitmap) 30.05(31.57+7.25)
5326.15: rev-list with tree filter (partial bitmap) 0.24(0.20+0.04)
There is slight overhead in "simulated clone", "simulated partial
clone", and "clone (partial bitmap)". Unsurprisingly, that overhead is
due to using the MIDX's reverse index to map between bit positions and
MIDX positions.
This can be reproduced by running "git repack -adb" along with "git
multi-pack-index write --bitmap" in a large-ish repository. Then run:
$ perf record -o pack.perf git -c core.multiPackIndex=false \
pack-objects --all --stdout >/dev/null </dev/null
$ perf record -o midx.perf git -c core.multiPackIndex=true \
pack-objects --all --stdout >/dev/null </dev/null
and compare the two with "perf diff -c delta -o 1 pack.perf midx.perf".
The most notable results are below (the next largest positive delta is
+0.14%):
# Event 'cycles'
#
# Baseline Delta Shared Object Symbol
# ........ ....... .................. ..........................
#
+5.86% git [.] nth_midxed_offset
+5.24% git [.] nth_midxed_pack_int_id
3.45% +0.97% git [.] offset_to_pack_pos
3.30% +0.57% git [.] pack_pos_to_offset
+0.30% git [.] pack_pos_to_midx
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new p5326 introduced by the next patch will want these same tests,
interjecting its own setup in between. Move them out so that both perf
tests can reuse them.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By using shorter names for the test repos, we will get a slightly more
compressed performance summary without comprimising clarity.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we increase our list of commands to test in
p2000-sparse-operations.sh, we will want to have a slightly smaller test
repository. Reduce the size by a factor of four by reducing the depth of
the step that creates a big index around a moderately-sized repository.
Also add a step to run 'git checkout -' on repeat. This requires having
a previous location in the reflog, so add that to the initialization
steps.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rewrite the backend for "diff -G/-S" to use pcre2 engine when
available.
* ab/pickaxe-pcre2: (22 commits)
xdiff-interface: replace discard_hunk_line() with a flag
xdiff users: use designated initializers for out_line
pickaxe -G: don't special-case create/delete
pickaxe -G: terminate early on matching lines
xdiff-interface: allow early return from xdiff_emit_line_fn
xdiff-interface: prepare for allowing early return
pickaxe -S: slightly optimize contains()
pickaxe: rename variables in has_changes() for brevity
pickaxe -S: support content with NULs under --pickaxe-regex
pickaxe: assert that we must have a needle under -G or -S
pickaxe: refactor function selection in diffcore-pickaxe()
perf: add performance test for pickaxe
pickaxe/style: consolidate declarations and assignments
diff.h: move pickaxe fields together again
pickaxe: die when --find-object and --pickaxe-all are combined
pickaxe: die when -G and --pickaxe-regex are combined
pickaxe tests: add missing test for --no-pickaxe-regex being an error
pickaxe tests: test for -G, -S and --find-object incompatibility
pickaxe tests: add test for "log -S" not being a regex
pickaxe tests: add test for diffgrep_consume() internals
...
When the TEST_OUTPUT_DIRECTORY is defined, then all test data will be
written in that directory instead of the default directory located in
"t/". While this works as expected for our normal tests, performance
tests fail to locate and aggregate performance data because they don't
know to handle TEST_OUTPUT_DIRECTORY correctly and always look at the
default location.
Fix the issue by adding a `--results-dir` parameter to "aggregate.perl"
which identifies the directory where results are and by making the "run"
script awake of the TEST_OUTPUT_DIRECTORY variable.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a test for the -G and -S pickaxe options and related options.
This test supports being run with GIT_TEST_LONG=1 to adjust the limit
on the number of commits from 1k to 10k. The 1k limit seems to hit a
good spot on git.git
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git repack -A -d" in a partial clone unnecessarily loosened
objects in promisor pack.
* rs/repack-without-loosening-promised-objects:
repack: avoid loosening promisor objects in partial clones
Builds on top of the sparse-index infrastructure to mark operations
that are not ready to mark with the sparse index, causing them to
fall back on fully-populated index that they always have worked with.
* ds/sparse-index-protections: (47 commits)
name-hash: use expand_to_path()
sparse-index: expand_to_path()
name-hash: don't add directories to name_hash
revision: ensure full index
resolve-undo: ensure full index
read-cache: ensure full index
pathspec: ensure full index
merge-recursive: ensure full index
entry: ensure full index
dir: ensure full index
update-index: ensure full index
stash: ensure full index
rm: ensure full index
merge-index: ensure full index
ls-files: ensure full index
grep: ensure full index
fsck: ensure full index
difftool: ensure full index
commit: ensure full index
checkout: ensure full index
...
When `git repack -A -d` is run in a partial clone, `pack-objects`
is invoked twice: once to repack all promisor objects, and once to
repack all non-promisor objects. The latter `pack-objects` invocation
is with --exclude-promisor-objects and --unpack-unreachable, which
loosens all objects unused during this invocation. Unfortunately,
this includes promisor objects.
Because the -d argument to `git repack` subsequently deletes all loose
objects also in packs, these just-loosened promisor objects will be
immediately deleted. However, this extra disk churn is unnecessary in
the first place. For example, in a newly-cloned partial repo that
filters all blob objects (e.g. `--filter=blob:none`), `repack` ends up
unpacking all trees and commits into the filesystem because every
object, in this particular case, is a promisor object. Depending on
the repo size, this increases the disk usage considerably: In my copy
of the linux.git, the object directory peaked 26GB of more disk usage.
In order to avoid this extra disk churn, pass the names of the promisor
packfiles as --keep-pack arguments to the second invocation of
`pack-objects`. This informs `pack-objects` that the promisor objects
are already in a safe packfile and, therefore, do not need to be
loosened.
For testing, we need to validate whether any object was loosened.
However, the "evidence" (loosened objects) is deleted during the
process which prevents us from inspecting the object directory.
Instead, let's teach `pack-objects` to count loosened objects and
emit via trace2 thus allowing inspecting the debug events after the
process is finished. This new event is used on the added regression
test.
Lastly, add a new perf test to evaluate the performance impact
made by this changes (tested on git.git):
Test HEAD^ HEAD
----------------------------------------------------------
5600.3: gc 134.38(41.93+90.95) 7.80(6.72+1.35) -94.2%
For a bigger repository, such as linux.git, the improvement is
even bigger:
Test HEAD^ HEAD
-------------------------------------------------------------------
5600.3: gc 6833.00(918.07+3162.74) 268.79(227.02+39.18) -96.1%
These improvements are particular big because every object in the
newly-cloned partial repository is a promisor object.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When --exclude-promisor-objects is given, before traversing any objects
we iterate over all of the objects in any promisor packs, marking them
as UNINTERESTING and SEEN. We turn the oid we get from iterating the
pack into an object with parse_object(), but this has two problems:
- it's slow; we are zlib inflating (and reconstructing from deltas)
every byte of every object in the packfile
- it leaves the tree buffers attached to their structs, which means
our heap usage will grow to store every uncompressed tree
simultaneously. This can be gigabytes.
We can obviously fix the second by freeing the tree buffers after we've
parsed them. But we can observe that the function doesn't look at the
object contents at all! The only reason we call parse_object() is that
we need a "struct object" on which to set the flags. There are two
options here:
- we can look up just the object type via oid_object_info(), and then
call the appropriate lookup_foo() function
- we can call lookup_unknown_object(), which gives us an OBJ_NONE
struct (which will get auto-converted later by object_as_type() via
calls to lookup_commit(), etc).
The first one is closer to the current code, but we do pay the price to
look up the type for each object. The latter should be more efficient in
CPU, though it wastes a little bit of memory (the "unknown" object
structs are a union of all object types, so some of the structs are
bigger than they need to be). It also runs the risk of triggering a
latent bug in code that calls lookup_object() directly but isn't ready
to handle OBJ_NONE (such code would already be buggy, but we use
lookup_unknown_object() infrequently enough that it might be hiding).
I went with the second option here. I don't think the risk is high (and
we'd want to find and fix any such bugs anyway), and it should be more
efficient overall.
The new tests in p5600 show off the improvement (this is on git.git):
Test HEAD^ HEAD
-------------------------------------------------------------------------------
5600.5: count commits 0.37(0.37+0.00) 0.38(0.38+0.00) +2.7%
5600.6: count non-promisor commits 11.74(11.37+0.37) 0.04(0.03+0.00) -99.7%
The improvement is particularly big in this script because _every_
object in the newly-cloned partial repo is a promisor object. So after
marking them all, there's nothing left to traverse.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To get the list of all promisor objects, we not only include all objects
in promisor packs, but also parse each of those objects to see which
objects they reference. After parsing a tree object, the tree->buffer
field will remain populated until we explicitly free it. So in a partial
clone of blob:none, for example, we are essentially reading every tree
in the repository (since they're all in the initial promisor pack), and
keeping all of their uncompressed contents in memory at once.
This patch frees the tree buffers after we've finished marking all of
their reachable objects. We shouldn't need to do this for any other
object type. While we are using some extra memory to store the structs,
no other object type stores the whole contents in its parsed form (we do
sometimes hold on to commit buffers, but less so these days due to
commit graphs, plus most commands which care about promisor objects turn
off the save_commit_buffer global).
Even for a moderate-sized repository like git.git, this patch drops the
peak heap (as measured by massif) for git-fsck from ~1.7GB to ~138MB.
Fsck is a good candidate for measuring here because it doesn't interact
with the promisor code except to call is_promisor_object(), so we can
isolate just this problem.
The added perf test shows only a tiny improvement on my machine for
git.git, since 1.7GB isn't enough to cause any real memory pressure:
Test HEAD^ HEAD
--------------------------------------------------------------------------------
5600.4: fsck 21.26(20.90+0.35) 20.84(20.79+0.04) -2.0%
With linux.git the absolute change is a bit bigger, though still a small
percentage:
Test HEAD^ HEAD
-----------------------------------------------------------------------------
5600.4: fsck 262.26(259.13+3.12) 254.92(254.62+0.29) -2.8%
I didn't have the patience to run it under massif with linux.git, but
it's probably on the order of about 14GB improvement, since that's the
sum of the sizes of all of the uncompressed trees (but still isn't
enough to create memory pressure on this particular machine, which has
64GB of RAM). Smaller machines would probably see a bigger effect on
runtime (and sadly our perf suite does not measure peak heap).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Optimize "rev-list --use-bitmap-index --objects" corner case that
uses negative tags as the stopping points.
* ps/pack-bitmap-optim:
pack-bitmap: avoid traversal of objects referenced by uninteresting tag
p2000-sparse-operations.sh compares different Git commands in
repositories with many files at HEAD but using sparse-checkout to focus
on a small portion of those files.
Add extra copies of the repository that use the sparse-index format so
we can track how that affects the performance of different commands.
At this point in time, the sparse-index is 100% overhead from the CPU
front, and this is measurable in these tests:
Test
---------------------------------------------------------------
2000.2: git status (full-index-v3) 0.59(0.51+0.12)
2000.3: git status (full-index-v4) 0.59(0.52+0.11)
2000.4: git status (sparse-index-v3) 1.40(1.32+0.12)
2000.5: git status (sparse-index-v4) 1.41(1.36+0.08)
2000.6: git add -A (full-index-v3) 2.32(1.97+0.19)
2000.7: git add -A (full-index-v4) 2.17(1.92+0.14)
2000.8: git add -A (sparse-index-v3) 2.31(2.21+0.15)
2000.9: git add -A (sparse-index-v4) 2.30(2.20+0.13)
2000.10: git add . (full-index-v3) 2.39(2.02+0.20)
2000.11: git add . (full-index-v4) 2.20(1.94+0.16)
2000.12: git add . (sparse-index-v3) 2.36(2.27+0.12)
2000.13: git add . (sparse-index-v4) 2.33(2.21+0.16)
2000.14: git commit -a -m A (full-index-v3) 2.47(2.12+0.20)
2000.15: git commit -a -m A (full-index-v4) 2.26(2.00+0.17)
2000.16: git commit -a -m A (sparse-index-v3) 3.01(2.92+0.16)
2000.17: git commit -a -m A (sparse-index-v4) 3.01(2.94+0.15)
Note that there is very little difference between the v3 and v4 index
formats when the sparse-index is enabled. This is primarily due to the
fact that the relative file sizes are the same, and the command time is
mostly taken up by parsing tree objects to expand the sparse index into
a full one.
With the current file layout, the index file sizes are given by this
table:
| full index | sparse index |
+-------------+--------------+
v3 | 108 MiB | 1.6 MiB |
v4 | 80 MiB | 1.2 MiB |
Future updates will improve the performance of Git commands when the
index is sparse.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a test script that takes the default performance test (the Git
codebase) and multiplies it by 256 using four layers of duplicated
trees of width four. This results in nearly one million blob entries in
the index. Then, we can clone this repository with sparse-checkout
patterns that demonstrate four copies of the initial repository. Each
clone will use a different index format or mode so peformance can be
tested across the different options.
Note that the initial repo is stripped of submodules before doing the
copies. This preserves the expected data shape of the sparse index,
because directories containing submodules are not collapsed to a sparse
directory entry.
Run a few Git commands on these clones, especially those that use the
index (status, add, commit).
Here are the results on my Linux machine:
Test
--------------------------------------------------------------
2000.2: git status (full-index-v3) 0.37(0.30+0.09)
2000.3: git status (full-index-v4) 0.39(0.32+0.10)
2000.4: git add -A (full-index-v3) 1.42(1.06+0.20)
2000.5: git add -A (full-index-v4) 1.26(0.98+0.16)
2000.6: git add . (full-index-v3) 1.40(1.04+0.18)
2000.7: git add . (full-index-v4) 1.26(0.98+0.17)
2000.8: git commit -a -m A (full-index-v3) 1.42(1.11+0.16)
2000.9: git commit -a -m A (full-index-v4) 1.33(1.08+0.16)
It is perhaps noteworthy that there is an improvement when using index
version 4. This is because the v3 index uses 108 MiB while the v4
index uses 80 MiB. Since the repeated portions of the directories are
very short (f3/f1/f2, for example) this ratio is less pronounced than in
similarly-sized real repositories.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff-index" codepath has been taught to trust fsmonitor status
to reduce number of lstat() calls.
* nk/diff-index-fsmonitor:
fsmonitor: add perf test for git diff HEAD
fsmonitor: add assertion that fsmonitor is valid to check_removed
fsmonitor: skip lstat deletion check during git diff-index
"git repack" so far has been only capable of repacking everything
under the sun into a single pack (or split by size). A cleverer
strategy to reduce the cost of repacking a repository has been
introduced.
* tb/geometric-repack:
builtin/pack-objects.c: ignore missing links with --stdin-packs
builtin/repack.c: reword comment around pack-objects flags
builtin/repack.c: be more conservative with unsigned overflows
builtin/repack.c: assign pack split later
t7703: test --geometric repack with loose objects
builtin/repack.c: do not repack single packs with --geometric
builtin/repack.c: add '--geometric' option
packfile: add kept-pack cache for find_kept_pack_entry()
builtin/pack-objects.c: rewrite honor-pack-keep logic
p5303: measure time to repack with keep
p5303: add missing &&-chains
builtin/pack-objects.c: add '--stdin-packs' option
revision: learn '--no-kept-objects'
packfile: introduce 'find_kept_pack_entry()'
Perf test update to work better in secondary worktrees.
* jk/perf-in-worktrees:
t/perf: avoid copying worktree files from test repo
t/perf: handle worktrees as test repos
When preparing the bitmap walk, we first establish the set of of have
and want objects by iterating over the set of pending objects: if an
object is marked as uninteresting, it's declared as an object we already
have, otherwise as an object we want. These two sets are then used to
compute which transitively referenced objects we need to obtain.
One special case here are tag objects: when a tag is requested, we
resolve it to its first not-tag object and add both resolved objects as
well as the tag itself into either the have or want set. Given that the
uninteresting-property always propagates to referenced objects, it is
clear that if the tag is uninteresting, so are its children and vice
versa. But we fail to propagate the flag, which effectively means that
referenced objects will always be interesting except for the case where
they have already been marked as uninteresting explicitly.
This mislabeling does not impact correctness: we now have it in our
"wants" set, and given that we later do an `AND NOT` of the bitmaps of
"wants" and "haves" sets it is clear that the result must be the same.
But we now start to needlessly traverse the tag's referenced objects in
case it is uninteresting, even though we know that each referenced
object will be uninteresting anyway. In the worst case, this can lead to
a complete graph walk just to establish that we do not care for any
object.
Fix the issue by propagating the `UNINTERESTING` flag to pointees of tag
objects and add a benchmark with negative revisions to p5310. This shows
some nice performance benefits, tested with linux.git:
Test HEAD~ HEAD
---------------------------------------------------------------------------------------------------------------
5310.3: repack to disk 193.18(181.46+16.42) 194.61(183.41+15.83) +0.7%
5310.4: simulated clone 25.93(24.88+1.05) 25.81(24.73+1.08) -0.5%
5310.5: simulated fetch 2.64(5.30+0.69) 2.59(5.16+0.65) -1.9%
5310.6: pack to file (bitmap) 58.75(57.56+6.30) 58.29(57.61+5.73) -0.8%
5310.7: rev-list (commits) 1.45(1.18+0.26) 1.46(1.22+0.24) +0.7%
5310.8: rev-list (objects) 15.35(14.22+1.13) 15.30(14.23+1.07) -0.3%
5310.9: rev-list with tag negated via --not --all (objects) 22.49(20.93+1.56) 0.11(0.09+0.01) -99.5%
5310.10: rev-list with negative tag (objects) 0.61(0.44+0.16) 0.51(0.35+0.16) -16.4%
5310.11: rev-list count with blob:none 12.15(11.19+0.96) 12.18(11.19+0.99) +0.2%
5310.12: rev-list count with blob:limit=1k 17.77(15.71+2.06) 17.75(15.63+2.12) -0.1%
5310.13: rev-list count with tree:0 1.69(1.31+0.38) 1.68(1.28+0.39) -0.6%
5310.14: simulated partial clone 20.14(19.15+0.98) 19.98(18.93+1.05) -0.8%
5310.16: clone (partial bitmap) 12.78(13.89+1.07) 12.72(13.99+1.01) -0.5%
5310.17: pack to file (partial bitmap) 42.07(45.44+2.72) 41.44(44.66+2.80) -1.5%
5310.18: rev-list with tree filter (partial bitmap) 0.44(0.29+0.15) 0.46(0.32+0.14) +4.5%
While most benchmarks are probably in the range of noise, the newly
added 5310.9 and 5310.10 benchmarks consistenly perform better.
Signed-off-by: Patrick Steinhardt <ps@pks.im>.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Preliminary changes to fsmonitor integration.
* jh/fsmonitor-prework:
fsmonitor: refactor initialization of fsmonitor_last_update token
fsmonitor: allow all entries for a folder to be invalidated
fsmonitor: log FSMN token when reading and writing the index
fsmonitor: log invocation of FSMonitor hook to trace2
read-cache: log the number of scanned files to trace2
read-cache: log the number of lstat calls to trace2
preload-index: log the number of lstat calls to trace2
p7519: add trace logging during perf test
p7519: move watchman cleanup earlier in the test
p7519: fix watchman watch-list test on Windows
p7519: do not rely on "xargs -d" in test
When running the perf suite, we copy files from an existing $GIT_DIR to
a scratch repository to give us a realistic setup on which to operate.
Since the perf scripts themselves may modify the scratch repository, we
want to make sure we've scrubbed any references back to the original.
One existing example is that we avoid copying the file "commondir" at
the top-level of the repository. In a worktree git-dir (e.g.,
.git/worktrees/foo), that file contains the path to the parent
repository; copying it could mean ref updates in the scratch repository
affect the original.
But there are other files we should cover, too:
- "gitdir" in a worktree git-dir contains the path to the actual .git
file in the working tree. We _shouldn't_ end up looking at it at
all, since the lack of a "commondir" file means Git won't consider
this to be a worktree git-dir. But it's best to err on the safe
side.
- in a parent repository that contains worktrees, the
"$GIT_DIR/worktrees" directory will contain the git dirs for the
individual worktrees. Which will themselves contain commondir and
gitdir files that may reference the original repository. We should
likewise remove them.
Note that this does mean that the perf suite's scratch repositories
will never have any worktrees. That's OK; we don't have any perf tests
that are influenced by their presence. If we add any, they'd
probably want to create the worktrees themselves anyway.
This patch adds both paths to the set of omissions in
test_perf_copy_repo_contents(). Note that we won't get confused here by
matching arbitrary names like refs/heads/commondir. This list is always
matching top-level entries in $GIT_DIR (we rely on "cp -R" to do the
actual recursion).
Suggested-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The perf suite gets confused when test_perf_default_repo is pointed at a
worktree (which includes when it is run from within a worktree at all,
since the default is to use the current repository).
Here's an example:
$ git worktree add ~/foo
Preparing worktree (new branch 'foo')
HEAD is now at 328c109303 The eighth batch
$ cd ~/foo
$ make
[...build output...]
$ cd t/perf
$ ./p0000-perf-lib-sanity.sh -v -i
[...]
perf 1 - test_perf_default_repo works:
running:
foo=$(git rev-parse HEAD) &&
test_export foo
fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
The problem is that we didn't copy all of the necessary files from the
source repository (in this case we got HEAD, but we have no refs!). We
discover the git-dir with "rev-parse --git-dir", but this points to the
worktree's partial repository in .../.git/worktrees/foo.
That partial repository has a "commondir" file which points to the main
repository, where the actual refs are stored, but we don't copy it. This
is the correct thing to do, though! If we did copy it, then our scratch
test repo would be pointing back to the original main repo, and any ref
updates we made in the tests would impact that original repo.
Instead, we need to either:
1. Make a scratch copy of the original main repo (in addition to the
worktree repo), and point the scratch worktree repo's commondir at
it. This preserves the original relationship, but it's doubtful any
script really cares (if they are testing worktree performance,
they'd probably make their own worktrees). And it's trickier to get
right.
2. Collapse the main and worktree repos into a single scratch repo.
This can be done by copying everything from both, preferring any
files from the worktree repo.
This patch does the second one. With this applied, the example above
results in p0000 running successfully.
Reported-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add two new tests to measure repack performance. Both tests split the
repository into synthetic "pushes", and then leave the remaining objects
in a big base pack.
The first new test marks an empty pack as "kept" and then passes
--honor-pack-keep to avoid including objects in it. That doesn't change
the resulting pack, but it does let us compare to the normal repack case
to see how much overhead we add to check whether objects are kept or
not.
The other test is of --stdin-packs, which gives us a sense of how that
number scales based on the number of packs we provide as input. In each
of those tests, the empty pack isn't considered, but the residual pack
(objects that were left over and not included in one of the synthetic
push packs) is marked as kept.
(Note that in the single-pack case of the --stdin-packs test, there is
nothing do since there are no non-excluded packs).
Here are some timings on a recent clone of the kernel:
5303.5: repack (1) 57.26(54.59+10.84)
5303.6: repack with kept (1) 57.33(54.80+10.51)
in the 50-pack case, things start to slow down:
5303.11: repack (50) 71.54(88.57+4.84)
5303.12: repack with kept (50) 85.12(102.05+4.94)
and by the time we hit 1,000 packs, things are substantially worse, even
though the resulting pack produced is the same:
5303.17: repack (1000) 216.87(490.79+14.57)
5303.18: repack with kept (1000) 665.63(938.87+15.76)
That's because the code paths around handling .keep files are known to
scale badly; they look in every single pack file to find each object.
Our solution to that was to notice that most repos don't have keep
files, and to make that case a fast path. But as soon as you add a
single .keep, that part of pack-objects slows down again (even if we
have fewer objects total to look at).
Likewise, the scaling is pretty extreme on --stdin-packs (but each
subsequent test is also being asked to do more work):
5303.7: repack with --stdin-packs (1) 0.01(0.01+0.00)
5303.13: repack with --stdin-packs (50) 3.53(12.07+0.24)
5303.19: repack with --stdin-packs (1000) 195.83(371.82+8.10)
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These are in a helper function, so the usual chain-lint doesn't notice
them. This function is still not perfect, as it has some git invocations
on the left-hand-side of the pipe, but it's primary purpose is timing,
not finding bugs or correctness issues.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add optional trace logging to allow us to better compare performance of
various fsmonitor providers and compare results with non-fsmonitor runs.
Currently, this includes Trace2 logging, but may be extended to include
other trace targets, such as GIT_TRACE_FSMONITOR if desired.
Using this logging helped me explain an odd behavior on MacOS where the
kernel was dropping events and causing the hook to Watchman to timeout.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Shutdown Watchman after the Watchman-based tests and before the block of
"no fsmonitor" tests.
This helps ensure that Watchman cannot affect the test results for the
other.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Only use the final portion of the test trash directory file name
when verifying that Watchman was started.
On Windows and under the SDK, $GIT_WORKTREE is a cygwin-style
path with forward slashes and a "/c/" drive name. However
`watchman watch-list` reports a proper Windows-style pathname
with drive letters and backslashes. This causes the grep to
fail. Since we don't really care about the full pathname (and
we really don't want to bother with normalizaing them), just see
if the test-name portion of the path is found.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the test to use a more portable method to update the mtime on a
large number of files under version control.
The Mac version of xargs does not support the "-d" option.
Likewise, the "-0" and "--null" options are not portable.
Furthermore, use `test-tool chmtime` rather than `touch` to update the
mtime to ensure that it is actually updated (especially on file systems
with only whole second resolution).
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some pretty-format specifiers do not need the data in commit object
(e.g. "%H"), but we were over-eager to load and parse it, which has
been made even lazier.
* jk/pretty-lazy-load-commit:
pretty: lazy-load commit data when expanding user-format
Using "1~5" isn't portable. Nobody seems to have noticed, since perhaps
people don't tend to run the perf suite on more exotic platforms. Still,
it's better to set a good example.
We can use:
perl -ne 'print if $. % 5 == 1'
instead. But we can further observe that perl does a good job of the
other parts of this pipeline, and fold the whole thing together.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we expand a user-format, we try to avoid work that isn't necessary
for the output. For instance, we don't bother parsing the commit header
until we know we need the author, subject, etc.
But we do always load the commit object's contents from disk, even if
the format doesn't require it (e.g., just "%H"). Traditionally this
didn't matter much, because we'd have loaded it as part of the traversal
anyway, and we'd typically have those bytes attached to the commit
struct (or these days, cached in a commit-slab).
But when we have a commit-graph, we might easily get to the point of
pretty-printing a commit without ever having looked at the actual object
contents. We should push off that load (and reencoding) until we're
certain that it's needed.
I think the results of p4205 show the advantage pretty clearly (we serve
parent and tree oids out of the commit struct itself, so they benefit as
well):
# using git.git as the test repo
Test HEAD^ HEAD
----------------------------------------------------------------------
4205.1: log with %H 0.40(0.39+0.01) 0.03(0.02+0.01) -92.5%
4205.2: log with %h 0.45(0.44+0.01) 0.09(0.09+0.00) -80.0%
4205.3: log with %T 0.40(0.39+0.00) 0.04(0.04+0.00) -90.0%
4205.4: log with %t 0.46(0.46+0.00) 0.09(0.08+0.01) -80.4%
4205.5: log with %P 0.39(0.39+0.00) 0.03(0.03+0.00) -92.3%
4205.6: log with %p 0.46(0.46+0.00) 0.10(0.09+0.00) -78.3%
4205.7: log with %h-%h-%h 0.52(0.51+0.01) 0.15(0.14+0.00) -71.2%
4205.8: log with %an-%ae-%s 0.42(0.41+0.00) 0.42(0.41+0.01) +0.0%
# using linux.git as the test repo
Test HEAD^ HEAD
----------------------------------------------------------------------
4205.1: log with %H 7.12(6.97+0.14) 0.76(0.65+0.11) -89.3%
4205.2: log with %h 7.35(7.19+0.16) 1.30(1.19+0.11) -82.3%
4205.3: log with %T 7.58(7.42+0.15) 1.02(0.94+0.08) -86.5%
4205.4: log with %t 8.05(7.89+0.15) 1.55(1.41+0.13) -80.7%
4205.5: log with %P 7.12(7.01+0.10) 0.76(0.69+0.07) -89.3%
4205.6: log with %p 7.38(7.27+0.10) 1.32(1.20+0.12) -82.1%
4205.7: log with %h-%h-%h 7.81(7.67+0.13) 1.79(1.67+0.12) -77.1%
4205.8: log with %an-%ae-%s 7.90(7.74+0.15) 7.81(7.66+0.15) -1.1%
I added the final test to show where we don't improve (the 1% there is
just lucky noise), but also as a regression test to make sure we're not
doing anything stupid like loading the commit multiple times when there
are several placeholders that need it.
Reported-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
p7519 measures the performance of the fsmonitor code. To do this, it
uses the installed copy of Watchman. If Watchman isn't installed, a noop
integration script is installed in its place.
When in the latter mode, it is expected that the script should not write
a "last update token": in fact, it doesn't write anything at all since
the script is blank.
Commit 33226af42b (t/perf/fsmonitor: improve error message if typoing
hook name, 2020-10-26) made sure that running 'git update-index
--fsmonitor' did not write anything to stderr, but this is not the case
when using the empty Watchman script, since Git will complain that:
$ which watchman
watchman not found
$ cat .git/hooks/fsmonitor-empty
$ git -c core.fsmonitor=.git/hooks/fsmonitor-empty update-index --fsmonitor
warning: Empty last update token.
Prior to 33226af42b, the output wasn't checked at all, which allowed
this noop mode to work. But, 33226af42b breaks p7519 when running it
without a 'watchman(1)' on your system.
Handle this by only checking that the stderr is empty only when running
with a real watchman executable. Otherwise, assert that the error
message is the expected one when running in the noop mode.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
test_export() has been self-recursive since its inception even though a
simple for-loop would have served just as well to append its arguments
to the `test_export_` variable separated by the pipe character "|".
Recently `test_export_` was changed instead to a space-separated list of
tokens to be exported, an operation which can be accomplished via a
single simple assignment, with no need for looping or recursion.
Therefore, simplify the implementation.
While at it, take advantage of the fact that variable names to be
exported are shell identifiers, thus won't be composed of special
characters or whitespace, thus simple a `$*` can be used rather than
magical `"$@"`.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
test_perf() runs each test in its own subshell which makes it difficult
to persist variables between tests. test_export() addresses this
shortcoming by grabbing the values of specified variables after a test
runs but before the subshell exits, and writes those values to a file
which is loaded into the environment of subsequent tests.
To grab the values to be persisted, test_export() pipes the output of
the shell's builtin `set` command through `sed` which plucks them out
using a regular expression along the lines of `s/^(var1|var2)/.../p`.
Unfortunately, though, this use of alternation is not portable. For
instance, BSD-lineage `sed` (including macOS `sed`) does not support it
in the default "basic regular expression" mode (BRE). It may be possible
to enable "extended regular expression" mode (ERE) in some cases with
`sed -E`, however, `-E` is neither portable nor part of POSIX.
Fortunately, alternation is unnecessary in this case and can easily be
avoided, so replace it with a series of simple expressions such as
`s/^var1/.../p;s/^var2/.../p`.
While at it, tighten the expressions so they match the variable names
exactly rather than matching prefixes (i.e. use `s/^var1=/.../p`).
If the requirements of test_export() become more complex in the future,
then an alternative would be to replace `sed` with `perl` which supports
alternation on all platforms, however, the simple elimination of
alternation via multiple `sed` expressions suffices for the present.
Reported-by: Sangeeta <sangunb09@gmail.com>
Diagnosed-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git update-ref --stdin" learns to take multiple transactions in a
single session.
* ps/update-ref-multi-transaction:
update-ref: disallow "start" for ongoing transactions
p1400: use `git-update-ref --stdin` to test multiple transactions
update-ref: allow creation of multiple transactions
t1400: avoid touching refs on filesystem
Simplify test and make error messages more clear here.
Per feedback from Junio in
33226af42b (t/perf/fsmonitor: improve error message if typoing hook
name, 2020-10-26)
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit 0a0fbbe3ff (refs: remove lookup cache for
reference-transaction hook, 2020-08-25), a new benchmark was added to
p1400 which has the intention to exercise creation of multiple
transactions in a single process. As git-update-ref wasn't yet able to
create multiple transactions with a single run we instead used git-push.
As its non-atomic version creates a transaction per reference update,
this was the best approximation we could make at that point in time.
Now that `git-update-ref --stdin` supports creation of multiple
transactions, let's convert the benchmark to use that instead. It has
less overhead and it's also a lot clearer what the actual intention is.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This benchmark covers the git status time for a heavily
dirty directory - benchmarking fsmonitor's refresh
When running to compare our perl vs rs-git-fsmonitor - we see that
the perl script incurs significant overhead - further motivation
to provide a faster implementation within git.
7519.7: status (dirty) (fsmonitor=query-watchman) 10.05(7.78+1.56)
7519.20: status (dirty) (fsmonitor=rs-git-fsmonitor) 6.72(4.37+1.64)
7519.33: status (dirty) (fsmonitor=disabled) 5.62(4.24+2.03)
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>