p7519 measures the performance of the fsmonitor code. To do this, it
uses the installed copy of Watchman. If Watchman isn't installed, a noop
integration script is installed in its place.
When in the latter mode, it is expected that the script should not write
a "last update token": in fact, it doesn't write anything at all since
the script is blank.
Commit 33226af42b (t/perf/fsmonitor: improve error message if typoing
hook name, 2020-10-26) made sure that running 'git update-index
--fsmonitor' did not write anything to stderr, but this is not the case
when using the empty Watchman script, since Git will complain that:
$ which watchman
watchman not found
$ cat .git/hooks/fsmonitor-empty
$ git -c core.fsmonitor=.git/hooks/fsmonitor-empty update-index --fsmonitor
warning: Empty last update token.
Prior to 33226af42b, the output wasn't checked at all, which allowed
this noop mode to work. But, 33226af42b breaks p7519 when running it
without a 'watchman(1)' on your system.
Handle this by only checking that the stderr is empty only when running
with a real watchman executable. Otherwise, assert that the error
message is the expected one when running in the noop mode.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
test_export() has been self-recursive since its inception even though a
simple for-loop would have served just as well to append its arguments
to the `test_export_` variable separated by the pipe character "|".
Recently `test_export_` was changed instead to a space-separated list of
tokens to be exported, an operation which can be accomplished via a
single simple assignment, with no need for looping or recursion.
Therefore, simplify the implementation.
While at it, take advantage of the fact that variable names to be
exported are shell identifiers, thus won't be composed of special
characters or whitespace, thus simple a `$*` can be used rather than
magical `"$@"`.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
test_perf() runs each test in its own subshell which makes it difficult
to persist variables between tests. test_export() addresses this
shortcoming by grabbing the values of specified variables after a test
runs but before the subshell exits, and writes those values to a file
which is loaded into the environment of subsequent tests.
To grab the values to be persisted, test_export() pipes the output of
the shell's builtin `set` command through `sed` which plucks them out
using a regular expression along the lines of `s/^(var1|var2)/.../p`.
Unfortunately, though, this use of alternation is not portable. For
instance, BSD-lineage `sed` (including macOS `sed`) does not support it
in the default "basic regular expression" mode (BRE). It may be possible
to enable "extended regular expression" mode (ERE) in some cases with
`sed -E`, however, `-E` is neither portable nor part of POSIX.
Fortunately, alternation is unnecessary in this case and can easily be
avoided, so replace it with a series of simple expressions such as
`s/^var1/.../p;s/^var2/.../p`.
While at it, tighten the expressions so they match the variable names
exactly rather than matching prefixes (i.e. use `s/^var1=/.../p`).
If the requirements of test_export() become more complex in the future,
then an alternative would be to replace `sed` with `perl` which supports
alternation on all platforms, however, the simple elimination of
alternation via multiple `sed` expressions suffices for the present.
Reported-by: Sangeeta <sangunb09@gmail.com>
Diagnosed-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git update-ref --stdin" learns to take multiple transactions in a
single session.
* ps/update-ref-multi-transaction:
update-ref: disallow "start" for ongoing transactions
p1400: use `git-update-ref --stdin` to test multiple transactions
update-ref: allow creation of multiple transactions
t1400: avoid touching refs on filesystem
Simplify test and make error messages more clear here.
Per feedback from Junio in
33226af42b (t/perf/fsmonitor: improve error message if typoing hook
name, 2020-10-26)
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit 0a0fbbe3ff (refs: remove lookup cache for
reference-transaction hook, 2020-08-25), a new benchmark was added to
p1400 which has the intention to exercise creation of multiple
transactions in a single process. As git-update-ref wasn't yet able to
create multiple transactions with a single run we instead used git-push.
As its non-atomic version creates a transaction per reference update,
this was the best approximation we could make at that point in time.
Now that `git-update-ref --stdin` supports creation of multiple
transactions, let's convert the benchmark to use that instead. It has
less overhead and it's also a lot clearer what the actual intention is.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This benchmark covers the git status time for a heavily
dirty directory - benchmarking fsmonitor's refresh
When running to compare our perl vs rs-git-fsmonitor - we see that
the perl script incurs significant overhead - further motivation
to provide a faster implementation within git.
7519.7: status (dirty) (fsmonitor=query-watchman) 10.05(7.78+1.56)
7519.20: status (dirty) (fsmonitor=rs-git-fsmonitor) 6.72(4.37+1.64)
7519.33: status (dirty) (fsmonitor=disabled) 5.62(4.24+2.03)
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This prepares for it being called multiple times when
testing different hooks
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is extremely verbose, printing >10K non-useful lines
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The full name is lengthy and makes it hard to read
Before:
7519.3: status (fsmonitor=/home/nipunn/src/server/.git/hooks/rs-git-fsmonitor) 0.02(0.01+0.00)
After
7519.3: status (fsmonitor=rs-git-fsmonitor) 0.03(0.02+0.00)
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There was much duplication here. Prepares for making
changes to the description.
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously - it would silently run the perf suite w/o using
fsmonitor - fsmonitor errors are not hard failures.
Now it errors loudly.
GIT_PERF_7519_FSMONITOR="$HOME/rs-git-fsmonitorr"
./p7519-fsmonitor.sh -i -v
fatal: cannot run /home/nipunn/rs-git-fsmonitorr:
No such file or directory
not ok 2 - setup for fsmonitor
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is only required to be set up once. This prepares for
testing multiple hooks in one invocation.
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for testing multiple fsmonitor hooks
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Much of the benchmark code is redundant. This is
easier to understand and edit.
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Perf tests have not been linted for some time.
They've grown some seq instead of test_seq. This
runs the existing lints on the perf tests as well.
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Results for the git-diff fsmonitor optimization
in patch in the parent-rev (using a 400k file repo to test)
As you can see here - git diff with fsmonitor running is
significantly better with this patch series (80% faster on my
workload)!
GIT_PERF_LARGE_REPO=~/src/server ./run v2.29.0-rc1 . -- p7519-fsmonitor.sh
Test v2.29.0-rc1 this tree
-----------------------------------------------------------------------------------------------------------------
7519.2: status (fsmonitor=.git/hooks/fsmonitor-watchman) 1.46(0.82+0.64) 1.47(0.83+0.62) +0.7%
7519.3: status -uno (fsmonitor=.git/hooks/fsmonitor-watchman) 0.16(0.12+0.04) 0.17(0.12+0.05) +6.3%
7519.4: status -uall (fsmonitor=.git/hooks/fsmonitor-watchman) 1.36(0.73+0.62) 1.37(0.76+0.60) +0.7%
7519.5: diff (fsmonitor=.git/hooks/fsmonitor-watchman) 0.85(0.22+0.63) 0.14(0.10+0.05) -83.5%
7519.6: diff -- 0_files (fsmonitor=.git/hooks/fsmonitor-watchman) 0.12(0.08+0.05) 0.13(0.11+0.02) +8.3%
7519.7: diff -- 10_files (fsmonitor=.git/hooks/fsmonitor-watchman) 0.12(0.08+0.04) 0.13(0.09+0.04) +8.3%
7519.8: diff -- 100_files (fsmonitor=.git/hooks/fsmonitor-watchman) 0.12(0.07+0.05) 0.13(0.07+0.06) +8.3%
7519.9: diff -- 1000_files (fsmonitor=.git/hooks/fsmonitor-watchman) 0.12(0.09+0.04) 0.13(0.08+0.05) +8.3%
7519.10: diff -- 10000_files (fsmonitor=.git/hooks/fsmonitor-watchman) 0.14(0.09+0.05) 0.13(0.10+0.03) -7.1%
7519.12: status (fsmonitor=) 1.67(0.93+1.49) 1.67(0.99+1.42) +0.0%
7519.13: status -uno (fsmonitor=) 0.37(0.30+0.82) 0.37(0.33+0.79) +0.0%
7519.14: status -uall (fsmonitor=) 1.58(0.97+1.35) 1.57(0.86+1.45) -0.6%
7519.15: diff (fsmonitor=) 0.34(0.28+0.83) 0.34(0.27+0.83) +0.0%
7519.16: diff -- 0_files (fsmonitor=) 0.09(0.06+0.04) 0.09(0.08+0.02) +0.0%
7519.17: diff -- 10_files (fsmonitor=) 0.09(0.07+0.03) 0.09(0.06+0.05) +0.0%
7519.18: diff -- 100_files (fsmonitor=) 0.09(0.06+0.04) 0.09(0.06+0.04) +0.0%
7519.19: diff -- 1000_files (fsmonitor=) 0.09(0.06+0.04) 0.09(0.05+0.05) +0.0%
7519.20: diff -- 10000_files (fsmonitor=) 0.10(0.08+0.04) 0.10(0.06+0.05) +0.0%
I also added a benchmark for a tiny git diff workload w/ a pathspec.
I see an approximately .02 second overhead added w/ and w/o fsmonitor
From looking at these results, I suspected that refresh_fsmonitor
is already happening during git diff - independent of this patch
series' optimization. Confirmed that suspicion by breaking on
refresh_fsmonitor.
(gdb) bt [simplified]
0 refresh_fsmonitor at fsmonitor.c:176
1 ie_match_stat at read-cache.c:375
2 match_stat_with_submodule at diff-lib.c:237
4 builtin_diff_files at builtin/diff.c:260
5 cmd_diff at builtin/diff.c:541
6 run_builtin at git.c:450
7 handle_builtin at git.c:700
8 run_argv at git.c:767
9 cmd_main at git.c:898
10 main at common-main.c:52
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The first git status would be inflated due to warming of
filesystem cache. This makes the results comparable.
Before
Test this tree
--------------------------------------------------------------------------------
7519.2: status (fsmonitor=.git/hooks/fsmonitor-watchman) 2.52(1.59+1.56)
7519.3: status -uno (fsmonitor=.git/hooks/fsmonitor-watchman) 0.18(0.12+0.06)
7519.4: status -uall (fsmonitor=.git/hooks/fsmonitor-watchman) 1.36(0.73+0.62)
7519.7: status (fsmonitor=) 0.69(0.52+0.90)
7519.8: status -uno (fsmonitor=) 0.37(0.28+0.81)
7519.9: status -uall (fsmonitor=) 1.53(0.93+1.32)
After
Test this tree
--------------------------------------------------------------------------------
7519.2: status (fsmonitor=.git/hooks/fsmonitor-watchman) 0.39(0.33+0.06)
7519.3: status -uno (fsmonitor=.git/hooks/fsmonitor-watchman) 0.17(0.13+0.05)
7519.4: status -uall (fsmonitor=.git/hooks/fsmonitor-watchman) 1.34(0.77+0.56)
7519.7: status (fsmonitor=) 0.70(0.53+0.90)
7519.8: status -uno (fsmonitor=) 0.37(0.32+0.78)
7519.9: status -uall (fsmonitor=) 1.55(1.01+1.25)
Signed-off-by: Nipunn Koorapati <nipunn@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is a logic to estimate how many objects are in the
repository, which is mean to run once per process invocation, but
it ran every time the estimated value was requested.
* jk/dont-count-existing-objects-twice:
packfile: actually set approximate_object_count_valid
The approximate_object_count() function tries to compute the count only
once per process. But ever since it was introduced in 8e3f52d778
(find_unique_abbrev: move logic out of get_short_sha1(), 2016-10-03), we
failed to actually set the "valid" flag, meaning we'd compute it fresh
on every call.
This turns out not to be _too_ bad, because we're only iterating through
the packed_git list, and not making any system calls. But since it may
get called for every abbreviated hash we output, even this can add up if
you have many packs.
Here are before-and-after timings for a new perf test which just asks
rev-list to abbreviate each commit hash (the test repo is linux.git,
with commit-graphs):
Test origin HEAD
----------------------------------------------------------------------------
5303.3: rev-list (1) 28.91(28.46+0.44) 29.03(28.65+0.38) +0.4%
5303.4: abbrev-commit (1) 1.18(1.06+0.11) 1.17(1.02+0.14) -0.8%
5303.7: rev-list (50) 28.95(28.56+0.38) 29.50(29.17+0.32) +1.9%
5303.8: abbrev-commit (50) 3.67(3.56+0.10) 3.57(3.42+0.15) -2.7%
5303.11: rev-list (1000) 30.34(29.89+0.43) 30.82(30.35+0.46) +1.6%
5303.12: abbrev-commit (1000) 86.82(86.52+0.29) 77.82(77.59+0.22) -10.4%
5303.15: load 10,000 packs 0.08(0.02+0.05) 0.08(0.02+0.06) +0.0%
It doesn't help at all when we have 1 pack (5303.4), but we get a 10%
speedup when there are 1000 packs (5303.12). That's a modest speedup for
a case that's already slow and we'd hope to avoid in general (note how
slow it is even after, because we have to look in each of those packs
for abbreviations). But it's a one-line change that clearly matches the
original intent, so it seems worth doing.
The included perf test may also be useful for keeping an eye on any
regressions in the overall abbreviation code.
Reported-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When adding the reference-transaction hook, there were concerns about
the performance impact it may have on setups which do not make use of
the new hook at all. After all, it gets executed every time a reftx is
prepared, committed or aborted, which linearly scales with the number of
reference-transactions created per session. And as there are code paths
like `git push` which create a new transaction for each reference to be
updated, this may translate to calling `find_hook()` quite a lot.
To address this concern, a cache was added with the intention to not
repeatedly do negative hook lookups. Turns out this cache caused a
regression, which was fixed via e5256c82e5 (refs: fix interleaving hook
calls with reference-transaction hook, 2020-08-07). In the process of
discussing the fix, we realized that the cache doesn't really help even
in the negative-lookup case. While performance tests added to benchmark
this did show a slight improvement in the 1% range, this really doesn't
warrent having a cache. Furthermore, it's quite flaky, too. E.g. running
it twice in succession produces the following results:
Test master pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.2: update-ref 2.79(2.16+0.74) 2.73(2.12+0.71) -2.2%
1400.3: update-ref --stdin 0.22(0.08+0.14) 0.21(0.08+0.12) -4.5%
Test master pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.2: update-ref 2.70(2.09+0.72) 2.74(2.13+0.71) +1.5%
1400.3: update-ref --stdin 0.21(0.10+0.10) 0.21(0.08+0.13) +0.0%
One case notably absent from those benchmarks is a single executable
searching for the hook hundreds of times, which is exactly the case for
which the negative cache was added. p1400.2 will spawn a new update-ref
for each transaction and p1400.3 only has a single reference-transaction
for all reference updates. So this commit adds a third benchmark, which
performs an non-atomic push of a thousand references. This will create a
new reference transaction per reference. But even for this case, the
negative cache doesn't consistently improve performance:
Test master pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.4: nonatomic push 6.63(6.50+0.13) 6.81(6.67+0.14) +2.7%
1400.4: nonatomic push 6.35(6.21+0.14) 6.39(6.23+0.16) +0.6%
1400.4: nonatomic push 6.43(6.31+0.13) 6.42(6.28+0.15) -0.2%
So let's just remove the cache altogether to simplify the code.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When PERF_EXTRA is enabled, p5302 checks the performance of index-pack
with various numbers of threads. This can be useful for deciding what
the default should be (which is currently capped at 3 threads based on
the results of this script).
However, we only go up to 8 threads, and modern machines may have more.
Let's get the number of CPUs from test-tool, and test various numbers of
threads between one and that maximum.
Note that the current tests aren't all identical, as we have to set
GIT_FORCE_THREADS for the --threads=1 test (which measures the overhead
of starting a single worker thread versus the "0" case of using the main
thread). To keep the loop simple, we'll keep the "0" case out of it, and
set GIT_FORCE_THREADS=1 for all of the other cases (it's a noop for all
but the "1" case, since numbers higher than 1 would always need
threads).
Note also that we could skip running "test-tool" if PERF_EXTRA isn't
set. However, there's some small value in knowing the number of threads,
so that we can mark each test as skipped in the output.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The primary function of the perf suite is to detect regressions (or
improvements) between versions of Git. The only numbers we show a direct
comparison for are timings between the same test run on two different
versions.
However, it can sometimes be used to collect other information. For
instance, p5302 runs the same index-pack operation with different thread
counts. The output doesn't directly compare these, but anybody
interested in working on index-pack can manually compare the results.
For a normal regression run of the full perf-suite, though, this incurs
a significant cost to generate numbers nobody will actually look at;
about 25% of the total time of the test suite is spent in p5302. And the
low-thread-count runs are the most expensive part of it, since they're
(unsurprisingly) not using as many threads.
Let's skip these tests by default, but make it possible for people
working on index-pack to still run them by setting an environment
variable. Rather than make this specific to p5302, let's introduce a
generic mechanism. This makes it possible to run the full suite with
every possible test if somebody really wants to burn some CPU.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The low-level reference transactions used to update references are
currently completely opaque to the user. While certainly desirable in
most usecases, there are some which might want to hook into the
transaction to observe all queued reference updates as well as observing
the abortion or commit of a prepared transaction.
One such usecase would be to have a set of replicas of a given Git
repository, where we perform Git operations on all of the repositories
at once and expect the outcome to be the same in all of them. While
there exist hooks already for a certain subset of Git commands that
could be used to implement a voting mechanism for this, many others
currently don't have any mechanism for this.
The above scenario is the motivation for the new "reference-transaction"
hook that reaches directly into Git's reference transaction mechanism.
The hook receives as parameter the current state the transaction was
moved to ("prepared", "committed" or "aborted") and gets via its
standard input all queued reference updates. While the exit code gets
ignored in the "committed" and "aborted" states, a non-zero exit code in
the "prepared" state will cause the transaction to be aborted
prematurely.
Given the usecase described above, a voting mechanism can now be
implemented via this hook: as soon as it gets called, it will take all
of stdin and use it to cast a vote to a central service. When all
replicas of the repository agree, the hook will exit with zero,
otherwise it will abort the transaction by returning non-zero. The most
important upside is that this will catch _all_ commands writing
references at once, allowing to implement strong consistency for
reference updates via a single mechanism.
In order to test the impact on the case where we don't have any
"reference-transaction" hook installed in the repository, this commit
introduce two new performance tests for git-update-refs(1). Run against
an empty repository, it produces the following results:
Test origin/master HEAD
--------------------------------------------------------------------
1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4%
1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0%
The performance test p1400.2 creates, updates and deletes a branch a
thousand times, thus averaging runtime of git-update-refs over 3000
invocations. p1400.3 instead calls `git-update-refs --stdin` three times
and queues a thousand creations, updates and deletes respectively.
As expected, p1400.3 consistently shows no noticeable impact, as for
each batch of updates there's a single call to access(3P) for the
negative hook lookup. On the other hand, for p1400.2, one can see an
impact caused by this patchset. But doing five runs of the performance
tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead
ranged from -1.5% to +1.1%. These inconsistent performance numbers can
be explained by the overhead of spawning 3000 processes. This shows that
the overhead of assembling the hook path and executing access(3P) once
to check if it's there is mostly outweighed by the operating system's
overhead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes a bitmap traversal still has to walk some commits manually,
because those commits aren't included in the bitmap packfile (e.g., due
to a push or commit since the last full repack). If we're given an
object filter, we don't pass it down to this traversal. It's not
necessary for correctness because the bitmap code has its own filters to
post-process the bitmap result (which it must, to filter out the objects
that _are_ mentioned in the bitmapped packfile).
And with blob filters, there was no performance reason to pass along
those filters, either. The fill-in traversal could omit them from the
result, but it wouldn't save us any time to do so, since we'd still have
to walk each tree entry to see if it's a blob or not.
But now that we support tree filters, there's opportunity for savings. A
tree:depth=0 filter means we can avoid accessing trees entirely, since
we know we won't them (or any of the subtrees or blobs they point to).
The new test in p5310 shows this off (the "partial bitmap" state is one
where HEAD~100 and its ancestors are all in a bitmapped pack, but
HEAD~100..HEAD are not). Here are the results (run against linux.git):
Test HEAD^ HEAD
-------------------------------------------------------------------------------------------------
[...]
5310.16: rev-list with tree filter (partial bitmap) 0.19(0.17+0.02) 0.03(0.02+0.01) -84.2%
The absolute number of savings isn't _huge_, but keep in mind that we
only omitted 100 first-parent links (in the version of linux.git here,
that's 894 actual commits). In a more pathological case, we might have a
much larger proportion of non-bitmapped commits. I didn't bother
creating such a case in the perf script because the setup is expensive,
and this is plenty to show the savings as a percentage.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the previous patch, we made it easy to define other filters that
exclude all objects of a certain type. Use that in order to implement
bitmap-level filtering for the '--filter=tree:<n>' filter when 'n' is
equal to 0.
The general case is not helped by bitmaps, since for values of 'n > 0',
the object filtering machinery requires a full-blown tree traversal in
order to determine the depth of a given tree. Caching this is
non-obvious, too, since the same tree object can have a different depth
depending on the context (e.g., a tree was moved up in the directory
hierarchy between two commits).
But, the 'n = 0' case can be helped, and this patch does so. Running
p5310.11 in this tree and on master with the kernel, we can see that
this case is helped substantially:
Test master this tree
--------------------------------------------------------------------------------
5310.11: rev-list count with tree:0 10.68(10.39+0.27) 0.06(0.04+0.01) -99.4%
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The custom hash function used by "git fast-import" has been
replaced with the one from hashmap.c, which gave us a nice
performance boost.
* jk/fast-import-use-hashmap:
fast-import: replace custom hash with hashmap.c
We use a custom hash in fast-import to store the set of objects we've
imported so far. It has a fixed set of 2^16 buckets and chains any
collisions with a linked list. As the number of objects grows larger
than that, the load factor increases and we degrade to O(n) lookups and
O(n^2) insertions.
We can scale better by using our hashmap.c implementation, which will
resize the bucket count as we grow. This does incur an extra memory cost
of 8 bytes per object, as hashmap stores the integer hash value for each
entry in its hashmap_entry struct (which we really don't care about
here, because we're just reusing the embedded object hash). But I think
the numbers below justify this (and our per-object memory cost is
already much higher).
I also looked at using khash, but it seemed to perform slightly worse
than hashmap at all sizes, and worse even than the existing code for
small sizes. It's also awkward to use here, because we want to look up a
"struct object_entry" from a "struct object_id", and it doesn't handle
mismatched keys as well. Making a mapping of object_id to object_entry
would be more natural, but that would require pulling the embedded oid
out of the object_entry or incurring an extra 32 bytes per object.
In a synthetic test creating as many cheap, tiny objects as possible
perl -e '
my $bits = shift;
my $nr = 2**$bits;
for (my $i = 0; $i < $nr; $i++) {
print "blob\n";
print "data 4\n";
print pack("N", $i);
}
' $bits | git fast-import
I got these results:
nr_objects master khash hashmap
2^20 0m4.317s 0m5.109s 0m3.890s
2^21 0m10.204s 0m9.702s 0m7.933s
2^22 0m27.159s 0m17.911s 0m16.751s
2^23 1m19.038s 0m35.080s 0m31.963s
2^24 4m18.766s 1m10.233s 1m6.793s
which points to hashmap as the winner. We didn't have any perf tests for
fast-export or fast-import, so I added one as a more real-world case.
It uses an export without blobs since that's significantly cheaper than
a full one, but still is an interesting case people might use (e.g., for
rewriting history). It will emphasize this change in some ways (as a
percentage we spend more time making objects and less shuffling blob
bytes around) and less in others (the total object count is lower).
Here are the results for linux.git:
Test HEAD^ HEAD
----------------------------------------------------------------------------
9300.1: export (no-blobs) 67.64(66.96+0.67) 67.81(67.06+0.75) +0.3%
9300.2: import (no-blobs) 284.04(283.34+0.69) 198.09(196.01+0.92) -30.3%
It only has ~5.2M commits and trees, so this is a larger effect than I
expected (the 2^23 case above only improved by 50s or so, but here we
gained almost 90s). This is probably due to actually performing more
object lookups in a real import with trees and commits, as opposed to
just dumping a bunch of blobs into a pack.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 645c432d61 (pack-objects: use reachability bitmap index when
generating non-stdout pack, 2016-09-10) added two timing tests for
packing to an on-disk file, both with and without bitmaps. However, the
non-bitmap one isn't interesting to have as part of p5310's regression
suite. It _could_ be used as a baseline to show off the improvement in
the bitmap case, but:
- the point of the t/perf suite is to find performance regressions,
and it won't help with that. We don't compare the numbers between
two tests (which the perf suite has no idea are even related), and
any change in its numbers would have nothing to do with bitmaps.
- it did show off the improvement in the commit message of 645c432d61,
but it wasn't even necessary there. The bitmap case already shows an
improvement (because before the patch, it behaved the same as the
non-bitmap case), and the perf suite is even able to show the
difference between the before and after measurements.
On top of that, it's one of the most expensive tests in the suite,
clocking in around 60s for linux.git on my machine (as compared to 16s
for the bitmapped version). And by default when using "./run", we'd run
it three times!
So let's just drop it. It's not useful and is adding minutes to perf
runs.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Just as rev-list recently learned to combine filters and bitmaps, let's
do the same for pack-objects. The infrastructure is all there; we just
need to pass along our filter options, and the pack-bitmap code will
decide to use bitmaps or not.
This unsurprisingly makes things faster for partial clones of large
repositories (here we're cloning linux.git):
Test HEAD^ HEAD
------------------------------------------------------------------------------
5310.11: simulated partial clone 38.94(37.28+5.87) 11.06(11.27+4.07) -71.6%
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Just as the previous commit implemented BLOB_NONE, we can support
BLOB_LIMIT filters by looking at the sizes of any blobs in the result
and unsetting their bits as appropriate. This is slightly more expensive
than BLOB_NONE, but still produces a noticeable speedup (these results
are on git.git):
Test HEAD~2 HEAD
------------------------------------------------------------------------------------
5310.9: rev-list count with blob:none 1.80(1.77+0.02) 0.22(0.20+0.02) -87.8%
5310.10: rev-list count with blob:limit=1k 1.99(1.96+0.03) 0.29(0.25+0.03) -85.4%
The implementation is similar to the BLOB_NONE one, with the exception
that we have to go object-by-object while walking the blob-type bitmap
(since we can't mask out the matches, but must look up the size
individually for each blob). The trick with using ctz64() is taken from
show_objects_for_type(), which likewise needs to find individual bits
(but wants to quickly skip over big chunks without blobs).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We can easily support BLOB_NONE filters with bitmaps. Since we know the
types of all of the objects, we just need to clear the result bits of
any blobs.
Note two subtleties in the implementation (which I also called out in
comments):
- we have to include any blobs that were specifically asked for (and
not reached through graph traversal) to match the non-bitmap version
- we have to handle in-pack and "ext_index" objects separately.
Arguably prepare_bitmap_walk() could be adding these ext_index
objects to the type bitmaps. But it doesn't for now, so let's match
the rest of the bitmap code here (it probably wouldn't be an
efficiency improvement to do so since the cost of extending those
bitmaps is about the same as our loop here, but it might make the
code a bit simpler).
Here are perf results for the new test on git.git:
Test HEAD^ HEAD
--------------------------------------------------------------------------------
5310.9: rev-list count with blob:none 1.67(1.62+0.05) 0.22(0.21+0.02) -86.8%
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ever since we added reachability bitmap support, we've been able to use
it with rev-list to get the full list of objects, like:
git rev-list --objects --use-bitmap-index --all
But you can't do so without --objects, since we weren't ready to just
show the commits. However, the internals of the bitmap code are mostly
ready for this: they avoid opening up trees when walking to fill in the
bitmaps. We just need to actually pass in the rev_info to
traverse_bitmap_commit_list() so it knows which types to bother
triggering our callback for.
For completeness, the perf test now covers both the existing --objects
case, as well as the new commits-only behavior (the objects one got way
faster when we introduced bitmaps, but obviously isn't improved now).
Here are numbers for linux.git:
Test HEAD^ HEAD
------------------------------------------------------------------------
5310.7: rev-list (commits) 8.29(8.10+0.19) 1.76(1.72+0.04) -78.8%
5310.8: rev-list (objects) 8.06(7.94+0.12) 8.14(7.94+0.13) +1.0%
That run was cheating a little, as I didn't have any commit-graph in the
repository, and we'd built it by default these days when running git-gc.
Here are numbers with a commit-graph:
Test HEAD^ HEAD
------------------------------------------------------------------------
5310.7: rev-list (commits) 0.70(0.58+0.12) 0.51(0.46+0.04) -27.1%
5310.8: rev-list (objects) 6.20(6.09+0.10) 6.27(6.16+0.11) +1.1%
Still an improvement, but a lot less impressive.
We could have the perf script remove any commit-graph to show the
out-sized effect, but it probably makes sense to leave it in what would
be a more typical setup.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a repository with many packfiles, the cost of the procedure that
avoids registering the same packfile twice was unnecessarily high
by using an inefficient search algorithm, which has been corrected.
* cs/store-packfiles-in-hashmap:
packfile.c: speed up loading lots of packfiles
PerfTest fix to avoid stale result mixed up with the latest round
of test results.
* tg/perf-remove-stale-result:
perf-lib: use a single filename for all measurement types
When loading packfiles on start-up, we traverse the internal packfile
list once per file to avoid reloading packfiles that have already
been loaded. This check runs in quadratic time, so for poorly
maintained repos with a large number of packfiles, it can be pretty
slow.
Add a hashmap containing the packfile names as we load them so that
the average runtime cost of checking for already-loaded packs becomes
constant.
Add a perf test to p5303 to show speed-up.
The existing p5303 test runtimes are dominated by other factors and do
not show an appreciable speed-up. The new test in p5303 clearly exposes
a speed-up in bad cases. In this test we create 10,000 packfiles and
measure the start-up time of git rev-parse, which does little else
besides load in the packs.
Here are the numbers for the new p5303 test:
Test HEAD^ HEAD
---------------------------------------------------------------------
5303.12: load 10,000 packs 1.03(0.92+0.10) 0.12(0.02+0.09) -88.3%
Signed-off-by: Colin Stolley <cstolley@runbox.com>
Helped-by: Jeff King <peff@peff.net>
[jc: squashed the change to call hashmap in install_packed_git() by peff]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The perf suite's aggregate.perl depends on Git.pm, which is a mild
annoyance if you've built git with NO_PERL. It turns out that the only
thing we use it for is a single call of the command_oneline() helper.
We can just replace this with backticks or similar.
Annoyingly, perl has no backtick equivalent that avoids a shell eval,
which means our $arg would require quoting. This probably doesn't matter
for our purposes, but it's better to be safe and model good style. So
we'll just provide a short helper around open(), which takes its
arguments as a list.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The perf tests write files recording the results of tests. These
results are later aggregated by 'aggregate.perl'. If the tests are run
multiple times, those results are overwritten by the new results. This
works just fine as long as there are only perf tests measuring the
times, whose results are stored in "$base".times files.
However 22bec79d1a ("t/perf: add infrastructure for measuring sizes",
2018-08-17) introduced a new type of test for measuring the size of
something. The results of this are written to "$base".size files.
"$base" is essentially made up of the basename of the script plus the
test number. So if test numbers shift because a new test was
introduced earlier in the script we might end up with both a ".times"
and a ".size" file for the same test. In the aggregation script the
".times" file is preferred over the ".size" file, so some size tests
might end with performance numbers from a previous run of the test.
This is mainly relevant when writing perf tests that check both
performance and sizes, and can get quite confusing during
developement.
We could fix this by doing a more thorough job of cleaning out old
".times" and ".size" files before running each test. However, an even
easier solution is to just use the same filename for both types of
measurement, meaning we'll always overwrite the previous result. We
don't even need to change the file format to distinguish the two;
aggregate.perl already decides which is which based on a regex of the
content (this may become ambiguous if we add new types in the future,
but we could easily add a header field to the file at that point).
Based on an initial patch from Thomas Gummerer, who discovered the
problem and did all of the analysis (which I stole for the commit
message above):
https://public-inbox.org/git/20191119185047.8550-1-t.gummerer@gmail.com/
Helped-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch fixes an extreme slowdown in pack-objects when you have more
than 1023 packs. See below for numbers.
Since 43fa44fa3b (pack-objects: move in_pack out of struct object_entry,
2018-04-14), we use a complicated system to save some per-object memory.
Each object_entry structs gets a 10-bit field to store the index of the
pack it's in. We map those indices into pointers using
packing_data->in_pack_by_idx, which we initialize at the start of the
program. If we have 2^10 or more packs, then we instead create an array
of pack pointers, one per object. This is packing_data->in_pack.
So far so good. But there's one other tricky case: if a new pack arrives
after we've initialized in_pack_by_idx, it won't have an index yet. We
solve that by calling oe_map_new_pack(), which just switches on the fly
to the less-optimal in_pack mechanism, allocating the array and
back-filling it for already-seen objects.
But that logic kicks in even when we've switched to it already (whether
because we really did see a new pack, or because we had too many packs
in the first place). The result doesn't produce a wrong outcome, but
it's very slow. What happens is this:
- imagine you have a repo with 500k objects and 2000 packs that you
want to repack.
- before looking at any objects, we call prepare_in_pack_by_idx(). It
starts allocating an index for each pack. On the 1024th pack, it
sees there are too many, so it bails, leaving in_pack_by_idx as
NULL.
- while actually adding objects to the packing list, we call
oe_set_in_pack(), which checks whether the pack already has an
index. If it's one of the packs after the first 1023, then it
doesn't have one, and we'll call oe_map_new_pack().
But there's no useful work for that function to do. We're already
using in_pack, so it just uselessly walks over the complete list of
objects, trying to backfill in_pack.
And we end up doing this for almost 1000 packs (each of which may be
triggered by more than one object). And each time it triggers, we
may iterate over up to 500k objects. So in the absolute worst case,
this is quadratic in the number of objects.
The solution is simple: we don't need to bother checking whether the
pack has an index if we've already converted to using in_pack, since by
definition we're not going to use it. So we can just push the "does the
pack have a valid index" check down into that half of the conditional,
where we know we're going to use it.
The current test in p5303 sadly doesn't notice this problem, since it
maxes out at 1000 packs. If we add a new test to it at 2000 packs, it
does show the improvement:
Test HEAD^ HEAD
----------------------------------------------------------------------
5303.12: repack (2000) 26.72(39.68+0.67) 15.70(28.70+0.66) -41.2%
However, these many-pack test cases are rather expensive to run, so
adding larger and larger numbers isn't appealing. Instead, we can show
it off more easily by using GIT_TEST_FULL_IN_PACK_ARRAY, which forces us
into the absolute worst case: no pack has an index, so we'll trigger
oe_map_new_pack() pointlessly for every single object, making it truly
quadratic.
Here are the numbers (on git.git) with the included change to p5303:
Test HEAD^ HEAD
----------------------------------------------------------------------
5303.3: rev-list (1) 2.05(1.98+0.06) 2.06(1.99+0.06) +0.5%
5303.4: repack (1) 33.45(33.46+0.19) 2.75(2.73+0.22) -91.8%
5303.6: rev-list (50) 2.07(2.01+0.06) 2.06(2.01+0.05) -0.5%
5303.7: repack (50) 34.21(35.18+0.16) 3.49(4.50+0.12) -89.8%
5303.9: rev-list (1000) 2.87(2.78+0.08) 2.88(2.80+0.07) +0.3%
5303.10: repack (1000) 41.26(51.30+0.47) 10.75(20.75+0.44) -73.9%
Again, those improvements aren't realistic for the 1-pack case (because
in the real world, the full-array solution doesn't kick in), but it's
more useful to be testing the more-complicated code path.
While we're looking at this issue, we'll tweak one more thing: in
oe_map_new_pack(), we call REALLOC_ARRAY(pack->in_pack). But we'd never
expect to get here unless we're back-filling it for the first time, in
which case it would be NULL. So let's switch that to ALLOC_ARRAY() for
clarity, and add a BUG() to document the expectation. Unfortunately this
code isn't well-covered in the test suite because it's inherently racy
(it only kicks in if somebody else adds a new pack while we're in the
middle of repacking).
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>