There is a logic to estimate how many objects are in the
repository, which is mean to run once per process invocation, but
it ran every time the estimated value was requested.
* jk/dont-count-existing-objects-twice:
packfile: actually set approximate_object_count_valid
The approximate_object_count() function tries to compute the count only
once per process. But ever since it was introduced in 8e3f52d778
(find_unique_abbrev: move logic out of get_short_sha1(), 2016-10-03), we
failed to actually set the "valid" flag, meaning we'd compute it fresh
on every call.
This turns out not to be _too_ bad, because we're only iterating through
the packed_git list, and not making any system calls. But since it may
get called for every abbreviated hash we output, even this can add up if
you have many packs.
Here are before-and-after timings for a new perf test which just asks
rev-list to abbreviate each commit hash (the test repo is linux.git,
with commit-graphs):
Test origin HEAD
----------------------------------------------------------------------------
5303.3: rev-list (1) 28.91(28.46+0.44) 29.03(28.65+0.38) +0.4%
5303.4: abbrev-commit (1) 1.18(1.06+0.11) 1.17(1.02+0.14) -0.8%
5303.7: rev-list (50) 28.95(28.56+0.38) 29.50(29.17+0.32) +1.9%
5303.8: abbrev-commit (50) 3.67(3.56+0.10) 3.57(3.42+0.15) -2.7%
5303.11: rev-list (1000) 30.34(29.89+0.43) 30.82(30.35+0.46) +1.6%
5303.12: abbrev-commit (1000) 86.82(86.52+0.29) 77.82(77.59+0.22) -10.4%
5303.15: load 10,000 packs 0.08(0.02+0.05) 0.08(0.02+0.06) +0.0%
It doesn't help at all when we have 1 pack (5303.4), but we get a 10%
speedup when there are 1000 packs (5303.12). That's a modest speedup for
a case that's already slow and we'd hope to avoid in general (note how
slow it is even after, because we have to look in each of those packs
for abbreviations). But it's a one-line change that clearly matches the
original intent, so it seems worth doing.
The included perf test may also be useful for keeping an eye on any
regressions in the overall abbreviation code.
Reported-by: Rasmus Villemoes <rv@rasmusvillemoes.dk>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When adding the reference-transaction hook, there were concerns about
the performance impact it may have on setups which do not make use of
the new hook at all. After all, it gets executed every time a reftx is
prepared, committed or aborted, which linearly scales with the number of
reference-transactions created per session. And as there are code paths
like `git push` which create a new transaction for each reference to be
updated, this may translate to calling `find_hook()` quite a lot.
To address this concern, a cache was added with the intention to not
repeatedly do negative hook lookups. Turns out this cache caused a
regression, which was fixed via e5256c82e5 (refs: fix interleaving hook
calls with reference-transaction hook, 2020-08-07). In the process of
discussing the fix, we realized that the cache doesn't really help even
in the negative-lookup case. While performance tests added to benchmark
this did show a slight improvement in the 1% range, this really doesn't
warrent having a cache. Furthermore, it's quite flaky, too. E.g. running
it twice in succession produces the following results:
Test master pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.2: update-ref 2.79(2.16+0.74) 2.73(2.12+0.71) -2.2%
1400.3: update-ref --stdin 0.22(0.08+0.14) 0.21(0.08+0.12) -4.5%
Test master pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.2: update-ref 2.70(2.09+0.72) 2.74(2.13+0.71) +1.5%
1400.3: update-ref --stdin 0.21(0.10+0.10) 0.21(0.08+0.13) +0.0%
One case notably absent from those benchmarks is a single executable
searching for the hook hundreds of times, which is exactly the case for
which the negative cache was added. p1400.2 will spawn a new update-ref
for each transaction and p1400.3 only has a single reference-transaction
for all reference updates. So this commit adds a third benchmark, which
performs an non-atomic push of a thousand references. This will create a
new reference transaction per reference. But even for this case, the
negative cache doesn't consistently improve performance:
Test master pks-reftx-hook-remove-cache
--------------------------------------------------------------------------
1400.4: nonatomic push 6.63(6.50+0.13) 6.81(6.67+0.14) +2.7%
1400.4: nonatomic push 6.35(6.21+0.14) 6.39(6.23+0.16) +0.6%
1400.4: nonatomic push 6.43(6.31+0.13) 6.42(6.28+0.15) -0.2%
So let's just remove the cache altogether to simplify the code.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When PERF_EXTRA is enabled, p5302 checks the performance of index-pack
with various numbers of threads. This can be useful for deciding what
the default should be (which is currently capped at 3 threads based on
the results of this script).
However, we only go up to 8 threads, and modern machines may have more.
Let's get the number of CPUs from test-tool, and test various numbers of
threads between one and that maximum.
Note that the current tests aren't all identical, as we have to set
GIT_FORCE_THREADS for the --threads=1 test (which measures the overhead
of starting a single worker thread versus the "0" case of using the main
thread). To keep the loop simple, we'll keep the "0" case out of it, and
set GIT_FORCE_THREADS=1 for all of the other cases (it's a noop for all
but the "1" case, since numbers higher than 1 would always need
threads).
Note also that we could skip running "test-tool" if PERF_EXTRA isn't
set. However, there's some small value in knowing the number of threads,
so that we can mark each test as skipped in the output.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The primary function of the perf suite is to detect regressions (or
improvements) between versions of Git. The only numbers we show a direct
comparison for are timings between the same test run on two different
versions.
However, it can sometimes be used to collect other information. For
instance, p5302 runs the same index-pack operation with different thread
counts. The output doesn't directly compare these, but anybody
interested in working on index-pack can manually compare the results.
For a normal regression run of the full perf-suite, though, this incurs
a significant cost to generate numbers nobody will actually look at;
about 25% of the total time of the test suite is spent in p5302. And the
low-thread-count runs are the most expensive part of it, since they're
(unsurprisingly) not using as many threads.
Let's skip these tests by default, but make it possible for people
working on index-pack to still run them by setting an environment
variable. Rather than make this specific to p5302, let's introduce a
generic mechanism. This makes it possible to run the full suite with
every possible test if somebody really wants to burn some CPU.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The low-level reference transactions used to update references are
currently completely opaque to the user. While certainly desirable in
most usecases, there are some which might want to hook into the
transaction to observe all queued reference updates as well as observing
the abortion or commit of a prepared transaction.
One such usecase would be to have a set of replicas of a given Git
repository, where we perform Git operations on all of the repositories
at once and expect the outcome to be the same in all of them. While
there exist hooks already for a certain subset of Git commands that
could be used to implement a voting mechanism for this, many others
currently don't have any mechanism for this.
The above scenario is the motivation for the new "reference-transaction"
hook that reaches directly into Git's reference transaction mechanism.
The hook receives as parameter the current state the transaction was
moved to ("prepared", "committed" or "aborted") and gets via its
standard input all queued reference updates. While the exit code gets
ignored in the "committed" and "aborted" states, a non-zero exit code in
the "prepared" state will cause the transaction to be aborted
prematurely.
Given the usecase described above, a voting mechanism can now be
implemented via this hook: as soon as it gets called, it will take all
of stdin and use it to cast a vote to a central service. When all
replicas of the repository agree, the hook will exit with zero,
otherwise it will abort the transaction by returning non-zero. The most
important upside is that this will catch _all_ commands writing
references at once, allowing to implement strong consistency for
reference updates via a single mechanism.
In order to test the impact on the case where we don't have any
"reference-transaction" hook installed in the repository, this commit
introduce two new performance tests for git-update-refs(1). Run against
an empty repository, it produces the following results:
Test origin/master HEAD
--------------------------------------------------------------------
1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4%
1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0%
The performance test p1400.2 creates, updates and deletes a branch a
thousand times, thus averaging runtime of git-update-refs over 3000
invocations. p1400.3 instead calls `git-update-refs --stdin` three times
and queues a thousand creations, updates and deletes respectively.
As expected, p1400.3 consistently shows no noticeable impact, as for
each batch of updates there's a single call to access(3P) for the
negative hook lookup. On the other hand, for p1400.2, one can see an
impact caused by this patchset. But doing five runs of the performance
tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead
ranged from -1.5% to +1.1%. These inconsistent performance numbers can
be explained by the overhead of spawning 3000 processes. This shows that
the overhead of assembling the hook path and executing access(3P) once
to check if it's there is mostly outweighed by the operating system's
overhead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes a bitmap traversal still has to walk some commits manually,
because those commits aren't included in the bitmap packfile (e.g., due
to a push or commit since the last full repack). If we're given an
object filter, we don't pass it down to this traversal. It's not
necessary for correctness because the bitmap code has its own filters to
post-process the bitmap result (which it must, to filter out the objects
that _are_ mentioned in the bitmapped packfile).
And with blob filters, there was no performance reason to pass along
those filters, either. The fill-in traversal could omit them from the
result, but it wouldn't save us any time to do so, since we'd still have
to walk each tree entry to see if it's a blob or not.
But now that we support tree filters, there's opportunity for savings. A
tree:depth=0 filter means we can avoid accessing trees entirely, since
we know we won't them (or any of the subtrees or blobs they point to).
The new test in p5310 shows this off (the "partial bitmap" state is one
where HEAD~100 and its ancestors are all in a bitmapped pack, but
HEAD~100..HEAD are not). Here are the results (run against linux.git):
Test HEAD^ HEAD
-------------------------------------------------------------------------------------------------
[...]
5310.16: rev-list with tree filter (partial bitmap) 0.19(0.17+0.02) 0.03(0.02+0.01) -84.2%
The absolute number of savings isn't _huge_, but keep in mind that we
only omitted 100 first-parent links (in the version of linux.git here,
that's 894 actual commits). In a more pathological case, we might have a
much larger proportion of non-bitmapped commits. I didn't bother
creating such a case in the perf script because the setup is expensive,
and this is plenty to show the savings as a percentage.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the previous patch, we made it easy to define other filters that
exclude all objects of a certain type. Use that in order to implement
bitmap-level filtering for the '--filter=tree:<n>' filter when 'n' is
equal to 0.
The general case is not helped by bitmaps, since for values of 'n > 0',
the object filtering machinery requires a full-blown tree traversal in
order to determine the depth of a given tree. Caching this is
non-obvious, too, since the same tree object can have a different depth
depending on the context (e.g., a tree was moved up in the directory
hierarchy between two commits).
But, the 'n = 0' case can be helped, and this patch does so. Running
p5310.11 in this tree and on master with the kernel, we can see that
this case is helped substantially:
Test master this tree
--------------------------------------------------------------------------------
5310.11: rev-list count with tree:0 10.68(10.39+0.27) 0.06(0.04+0.01) -99.4%
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The custom hash function used by "git fast-import" has been
replaced with the one from hashmap.c, which gave us a nice
performance boost.
* jk/fast-import-use-hashmap:
fast-import: replace custom hash with hashmap.c
We use a custom hash in fast-import to store the set of objects we've
imported so far. It has a fixed set of 2^16 buckets and chains any
collisions with a linked list. As the number of objects grows larger
than that, the load factor increases and we degrade to O(n) lookups and
O(n^2) insertions.
We can scale better by using our hashmap.c implementation, which will
resize the bucket count as we grow. This does incur an extra memory cost
of 8 bytes per object, as hashmap stores the integer hash value for each
entry in its hashmap_entry struct (which we really don't care about
here, because we're just reusing the embedded object hash). But I think
the numbers below justify this (and our per-object memory cost is
already much higher).
I also looked at using khash, but it seemed to perform slightly worse
than hashmap at all sizes, and worse even than the existing code for
small sizes. It's also awkward to use here, because we want to look up a
"struct object_entry" from a "struct object_id", and it doesn't handle
mismatched keys as well. Making a mapping of object_id to object_entry
would be more natural, but that would require pulling the embedded oid
out of the object_entry or incurring an extra 32 bytes per object.
In a synthetic test creating as many cheap, tiny objects as possible
perl -e '
my $bits = shift;
my $nr = 2**$bits;
for (my $i = 0; $i < $nr; $i++) {
print "blob\n";
print "data 4\n";
print pack("N", $i);
}
' $bits | git fast-import
I got these results:
nr_objects master khash hashmap
2^20 0m4.317s 0m5.109s 0m3.890s
2^21 0m10.204s 0m9.702s 0m7.933s
2^22 0m27.159s 0m17.911s 0m16.751s
2^23 1m19.038s 0m35.080s 0m31.963s
2^24 4m18.766s 1m10.233s 1m6.793s
which points to hashmap as the winner. We didn't have any perf tests for
fast-export or fast-import, so I added one as a more real-world case.
It uses an export without blobs since that's significantly cheaper than
a full one, but still is an interesting case people might use (e.g., for
rewriting history). It will emphasize this change in some ways (as a
percentage we spend more time making objects and less shuffling blob
bytes around) and less in others (the total object count is lower).
Here are the results for linux.git:
Test HEAD^ HEAD
----------------------------------------------------------------------------
9300.1: export (no-blobs) 67.64(66.96+0.67) 67.81(67.06+0.75) +0.3%
9300.2: import (no-blobs) 284.04(283.34+0.69) 198.09(196.01+0.92) -30.3%
It only has ~5.2M commits and trees, so this is a larger effect than I
expected (the 2^23 case above only improved by 50s or so, but here we
gained almost 90s). This is probably due to actually performing more
object lookups in a real import with trees and commits, as opposed to
just dumping a bunch of blobs into a pack.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 645c432d61 (pack-objects: use reachability bitmap index when
generating non-stdout pack, 2016-09-10) added two timing tests for
packing to an on-disk file, both with and without bitmaps. However, the
non-bitmap one isn't interesting to have as part of p5310's regression
suite. It _could_ be used as a baseline to show off the improvement in
the bitmap case, but:
- the point of the t/perf suite is to find performance regressions,
and it won't help with that. We don't compare the numbers between
two tests (which the perf suite has no idea are even related), and
any change in its numbers would have nothing to do with bitmaps.
- it did show off the improvement in the commit message of 645c432d61,
but it wasn't even necessary there. The bitmap case already shows an
improvement (because before the patch, it behaved the same as the
non-bitmap case), and the perf suite is even able to show the
difference between the before and after measurements.
On top of that, it's one of the most expensive tests in the suite,
clocking in around 60s for linux.git on my machine (as compared to 16s
for the bitmapped version). And by default when using "./run", we'd run
it three times!
So let's just drop it. It's not useful and is adding minutes to perf
runs.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Just as rev-list recently learned to combine filters and bitmaps, let's
do the same for pack-objects. The infrastructure is all there; we just
need to pass along our filter options, and the pack-bitmap code will
decide to use bitmaps or not.
This unsurprisingly makes things faster for partial clones of large
repositories (here we're cloning linux.git):
Test HEAD^ HEAD
------------------------------------------------------------------------------
5310.11: simulated partial clone 38.94(37.28+5.87) 11.06(11.27+4.07) -71.6%
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Just as the previous commit implemented BLOB_NONE, we can support
BLOB_LIMIT filters by looking at the sizes of any blobs in the result
and unsetting their bits as appropriate. This is slightly more expensive
than BLOB_NONE, but still produces a noticeable speedup (these results
are on git.git):
Test HEAD~2 HEAD
------------------------------------------------------------------------------------
5310.9: rev-list count with blob:none 1.80(1.77+0.02) 0.22(0.20+0.02) -87.8%
5310.10: rev-list count with blob:limit=1k 1.99(1.96+0.03) 0.29(0.25+0.03) -85.4%
The implementation is similar to the BLOB_NONE one, with the exception
that we have to go object-by-object while walking the blob-type bitmap
(since we can't mask out the matches, but must look up the size
individually for each blob). The trick with using ctz64() is taken from
show_objects_for_type(), which likewise needs to find individual bits
(but wants to quickly skip over big chunks without blobs).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We can easily support BLOB_NONE filters with bitmaps. Since we know the
types of all of the objects, we just need to clear the result bits of
any blobs.
Note two subtleties in the implementation (which I also called out in
comments):
- we have to include any blobs that were specifically asked for (and
not reached through graph traversal) to match the non-bitmap version
- we have to handle in-pack and "ext_index" objects separately.
Arguably prepare_bitmap_walk() could be adding these ext_index
objects to the type bitmaps. But it doesn't for now, so let's match
the rest of the bitmap code here (it probably wouldn't be an
efficiency improvement to do so since the cost of extending those
bitmaps is about the same as our loop here, but it might make the
code a bit simpler).
Here are perf results for the new test on git.git:
Test HEAD^ HEAD
--------------------------------------------------------------------------------
5310.9: rev-list count with blob:none 1.67(1.62+0.05) 0.22(0.21+0.02) -86.8%
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ever since we added reachability bitmap support, we've been able to use
it with rev-list to get the full list of objects, like:
git rev-list --objects --use-bitmap-index --all
But you can't do so without --objects, since we weren't ready to just
show the commits. However, the internals of the bitmap code are mostly
ready for this: they avoid opening up trees when walking to fill in the
bitmaps. We just need to actually pass in the rev_info to
traverse_bitmap_commit_list() so it knows which types to bother
triggering our callback for.
For completeness, the perf test now covers both the existing --objects
case, as well as the new commits-only behavior (the objects one got way
faster when we introduced bitmaps, but obviously isn't improved now).
Here are numbers for linux.git:
Test HEAD^ HEAD
------------------------------------------------------------------------
5310.7: rev-list (commits) 8.29(8.10+0.19) 1.76(1.72+0.04) -78.8%
5310.8: rev-list (objects) 8.06(7.94+0.12) 8.14(7.94+0.13) +1.0%
That run was cheating a little, as I didn't have any commit-graph in the
repository, and we'd built it by default these days when running git-gc.
Here are numbers with a commit-graph:
Test HEAD^ HEAD
------------------------------------------------------------------------
5310.7: rev-list (commits) 0.70(0.58+0.12) 0.51(0.46+0.04) -27.1%
5310.8: rev-list (objects) 6.20(6.09+0.10) 6.27(6.16+0.11) +1.1%
Still an improvement, but a lot less impressive.
We could have the perf script remove any commit-graph to show the
out-sized effect, but it probably makes sense to leave it in what would
be a more typical setup.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a repository with many packfiles, the cost of the procedure that
avoids registering the same packfile twice was unnecessarily high
by using an inefficient search algorithm, which has been corrected.
* cs/store-packfiles-in-hashmap:
packfile.c: speed up loading lots of packfiles
PerfTest fix to avoid stale result mixed up with the latest round
of test results.
* tg/perf-remove-stale-result:
perf-lib: use a single filename for all measurement types
When loading packfiles on start-up, we traverse the internal packfile
list once per file to avoid reloading packfiles that have already
been loaded. This check runs in quadratic time, so for poorly
maintained repos with a large number of packfiles, it can be pretty
slow.
Add a hashmap containing the packfile names as we load them so that
the average runtime cost of checking for already-loaded packs becomes
constant.
Add a perf test to p5303 to show speed-up.
The existing p5303 test runtimes are dominated by other factors and do
not show an appreciable speed-up. The new test in p5303 clearly exposes
a speed-up in bad cases. In this test we create 10,000 packfiles and
measure the start-up time of git rev-parse, which does little else
besides load in the packs.
Here are the numbers for the new p5303 test:
Test HEAD^ HEAD
---------------------------------------------------------------------
5303.12: load 10,000 packs 1.03(0.92+0.10) 0.12(0.02+0.09) -88.3%
Signed-off-by: Colin Stolley <cstolley@runbox.com>
Helped-by: Jeff King <peff@peff.net>
[jc: squashed the change to call hashmap in install_packed_git() by peff]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The perf suite's aggregate.perl depends on Git.pm, which is a mild
annoyance if you've built git with NO_PERL. It turns out that the only
thing we use it for is a single call of the command_oneline() helper.
We can just replace this with backticks or similar.
Annoyingly, perl has no backtick equivalent that avoids a shell eval,
which means our $arg would require quoting. This probably doesn't matter
for our purposes, but it's better to be safe and model good style. So
we'll just provide a short helper around open(), which takes its
arguments as a list.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The perf tests write files recording the results of tests. These
results are later aggregated by 'aggregate.perl'. If the tests are run
multiple times, those results are overwritten by the new results. This
works just fine as long as there are only perf tests measuring the
times, whose results are stored in "$base".times files.
However 22bec79d1a ("t/perf: add infrastructure for measuring sizes",
2018-08-17) introduced a new type of test for measuring the size of
something. The results of this are written to "$base".size files.
"$base" is essentially made up of the basename of the script plus the
test number. So if test numbers shift because a new test was
introduced earlier in the script we might end up with both a ".times"
and a ".size" file for the same test. In the aggregation script the
".times" file is preferred over the ".size" file, so some size tests
might end with performance numbers from a previous run of the test.
This is mainly relevant when writing perf tests that check both
performance and sizes, and can get quite confusing during
developement.
We could fix this by doing a more thorough job of cleaning out old
".times" and ".size" files before running each test. However, an even
easier solution is to just use the same filename for both types of
measurement, meaning we'll always overwrite the previous result. We
don't even need to change the file format to distinguish the two;
aggregate.perl already decides which is which based on a regex of the
content (this may become ambiguous if we add new types in the future,
but we could easily add a header field to the file at that point).
Based on an initial patch from Thomas Gummerer, who discovered the
problem and did all of the analysis (which I stole for the commit
message above):
https://public-inbox.org/git/20191119185047.8550-1-t.gummerer@gmail.com/
Helped-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch fixes an extreme slowdown in pack-objects when you have more
than 1023 packs. See below for numbers.
Since 43fa44fa3b (pack-objects: move in_pack out of struct object_entry,
2018-04-14), we use a complicated system to save some per-object memory.
Each object_entry structs gets a 10-bit field to store the index of the
pack it's in. We map those indices into pointers using
packing_data->in_pack_by_idx, which we initialize at the start of the
program. If we have 2^10 or more packs, then we instead create an array
of pack pointers, one per object. This is packing_data->in_pack.
So far so good. But there's one other tricky case: if a new pack arrives
after we've initialized in_pack_by_idx, it won't have an index yet. We
solve that by calling oe_map_new_pack(), which just switches on the fly
to the less-optimal in_pack mechanism, allocating the array and
back-filling it for already-seen objects.
But that logic kicks in even when we've switched to it already (whether
because we really did see a new pack, or because we had too many packs
in the first place). The result doesn't produce a wrong outcome, but
it's very slow. What happens is this:
- imagine you have a repo with 500k objects and 2000 packs that you
want to repack.
- before looking at any objects, we call prepare_in_pack_by_idx(). It
starts allocating an index for each pack. On the 1024th pack, it
sees there are too many, so it bails, leaving in_pack_by_idx as
NULL.
- while actually adding objects to the packing list, we call
oe_set_in_pack(), which checks whether the pack already has an
index. If it's one of the packs after the first 1023, then it
doesn't have one, and we'll call oe_map_new_pack().
But there's no useful work for that function to do. We're already
using in_pack, so it just uselessly walks over the complete list of
objects, trying to backfill in_pack.
And we end up doing this for almost 1000 packs (each of which may be
triggered by more than one object). And each time it triggers, we
may iterate over up to 500k objects. So in the absolute worst case,
this is quadratic in the number of objects.
The solution is simple: we don't need to bother checking whether the
pack has an index if we've already converted to using in_pack, since by
definition we're not going to use it. So we can just push the "does the
pack have a valid index" check down into that half of the conditional,
where we know we're going to use it.
The current test in p5303 sadly doesn't notice this problem, since it
maxes out at 1000 packs. If we add a new test to it at 2000 packs, it
does show the improvement:
Test HEAD^ HEAD
----------------------------------------------------------------------
5303.12: repack (2000) 26.72(39.68+0.67) 15.70(28.70+0.66) -41.2%
However, these many-pack test cases are rather expensive to run, so
adding larger and larger numbers isn't appealing. Instead, we can show
it off more easily by using GIT_TEST_FULL_IN_PACK_ARRAY, which forces us
into the absolute worst case: no pack has an index, so we'll trigger
oe_map_new_pack() pointlessly for every single object, making it truly
quadratic.
Here are the numbers (on git.git) with the included change to p5303:
Test HEAD^ HEAD
----------------------------------------------------------------------
5303.3: rev-list (1) 2.05(1.98+0.06) 2.06(1.99+0.06) +0.5%
5303.4: repack (1) 33.45(33.46+0.19) 2.75(2.73+0.22) -91.8%
5303.6: rev-list (50) 2.07(2.01+0.06) 2.06(2.01+0.05) -0.5%
5303.7: repack (50) 34.21(35.18+0.16) 3.49(4.50+0.12) -89.8%
5303.9: rev-list (1000) 2.87(2.78+0.08) 2.88(2.80+0.07) +0.3%
5303.10: repack (1000) 41.26(51.30+0.47) 10.75(20.75+0.44) -73.9%
Again, those improvements aren't realistic for the 1-pack case (because
in the real world, the full-array solution doesn't kick in), but it's
more useful to be testing the more-complicated code path.
While we're looking at this issue, we'll tweak one more thing: in
oe_map_new_pack(), we call REALLOC_ARRAY(pack->in_pack). But we'd never
expect to get here unless we're back-filling it for the first time, in
which case it would be NULL. So let's switch that to ALLOC_ARRAY() for
clarity, and add a BUG() to document the expectation. Unfortunately this
code isn't well-covered in the test suite because it's inherently racy
(it only kicks in if somebody else adds a new pack while we're in the
middle of repacking).
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two perf scripts numbered p5600, but with otherwise different
names ("clone-reference" versus "partial-clone"). We store timing
results in files named after the whole script, so internally we don't
get confused between the two. But "aggregate.perl" just prints the test
number for each result, giving multiple entries for "5600.3". It also
makes it impossible to skip one test but not the other with
GIT_SKIP_TESTS.
Let's renumber the one that appeared later (by date -- the source of the
problem is that the two were developed on independent branches). For the
non-perf test suite, our test-lint rule would have complained about this
when the two were merged, but t/perf never learned that trick.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we receive a remote ref update to sha1 "X", we want to check that
we have all of the objects needed by "X". We can assume that our
repository is not currently corrupted, and therefore if we have a ref
pointing at "Y", we have all of its objects. So we can stop our
traversal from "X" as soon as we hit "Y".
If we make the same non-corruption assumption about any repositories we
use to store alternates, then we can also use their ref tips to shorten
the traversal.
This is especially useful when cloning with "--reference", as we
otherwise do not have any local refs to check against, and have to
traverse the whole history, even though the other side may have sent us
few or no objects. Here are results for the included perf test (which
shows off more or less the maximal savings, getting one new commit and
sharing the whole history):
Test HEAD^ HEAD
--------------------------------------------------------------------
[on git.git]
5600.3: clone --reference 2.94(2.86+0.08) 0.09(0.08+0.01) -96.9%
[on linux.git]
5600.3: clone --reference 45.74(45.34+0.41) 0.36(0.30+0.08) -99.2%
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Performance test framework has been broken and measured the version
of Git that happens to be on $PATH, not the specified one to
measure, for a while, which has been corrected.
* ab/perf-installed-fix:
perf-lib.sh: forbid the use of GIT_TEST_INSTALLED
perf tests: add "bindir" prefix to git tree test results
perf-lib.sh: remove GIT_TEST_INSTALLED from perf-lib.sh
perf-lib.sh: make "./run <revisions>" use the correct gits
perf aggregate: remove GIT_TEST_INSTALLED from --codespeed
perf README: correct docs for 3c8f12c96c regression
The script to aggregate perf result unconditionally depended on
libjson-perl even though it did not have to, which has been
corrected.
* jk/perf-aggregate-wo-libjson:
t/perf: depend on perl JSON only when using --codespeed
Fix index-pack perf test so that the repeated invocations always
run in an empty repository, which emulates the initial clone
situation better.
* jk/p5302-avoid-collision-check-cost:
p5302: create the repo in each index-pack test
The connectivity bitmaps are created by default in bare
repositories now; also the pathname hash-cache is created by
default to avoid making crappy deltas when repacking.
* ew/repack-with-bitmaps-by-default:
pack-objects: default to writing bitmap hash-cache
t5310: correctly remove bitmaps for jgit test
repack: enable bitmaps by default on bare repos
During an initial "git clone --depth=..." partial clone, it is
pointless to spend cycles for a large portion of the connectivity
check that enumerates and skips promisor objects (which by
definition is all objects fetched from the other side). This has
been optimized out.
* js/partial-clone-connectivity-check:
t/perf: add perf script for partial clones
clone: do faster object check for partial clones
As noted in preceding commits setting GIT_TEST_INSTALLED has never
been supported or documented, and as noted in an earlier t/perf/README
change to the extent that it's been documented nobody's notices that
the example hasn't worked since 3c8f12c96c ("test-lib: reorder and
include GIT-BUILD-OPTIONS a lot earlier", 2012-06-24).
We could directly support GIT_TEST_INSTALLED for invocations without
the "run" script, such as:
GIT_TEST_INSTALLED=../../ ./p0000-perf-lib-sanity.sh
GIT_TEST_INSTALLED=/home/avar/g/git ./p0000-perf-lib-sanity.sh
But while not having this "error" will "work", it won't write the the
resulting "test-results/*" files to the right place, and thus a
subsequent call to aggregate.perl won't work as expected.
Let's just tell the user that they need to use the "run" script,
which'll correctly deal with this and set the right
PERF_RESULTS_PREFIX.
If someone's in desperate need of bypassing "run" for whatever reason
they can trivially do so by setting "PERF_SET_GIT_TEST_INSTALLED", but
not we won't have people who expect GIT_TEST_INSTALLED to just work
wondering why their aggregation doesn't work, even though they're
running the right "git".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Change the output file names in test-results/ to be
"test-results/bindir_<munged dir>" rather than just
"test-results/<munged dir>".
This is for consistency with the "build_" directories we have for
built revisions, i.e. "test-results/build_<SHA-1>".
There's no user-visible functional changes here, it just makes it
easier to see at a glance what "test-results" files are of what "type"
as they're all explicitly grouped together now, and to grep this code
to find both the run_dirs_helper() implementation and its
corresponding aggregate.perl code.
Note that we already guarantee that the rest of the
PERF_RESULTS_PREFIX is an absolute path, and since it'll start with
e.g. "/" which we munge to "_" we'll up with a readable string like
"bindir_home_avar[...]".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Follow-up my preceding change which fixed the immediate "./run
<revisions>" regression in 0baf78e7bc ("perf-lib.sh: rely on
test-lib.sh for --tee handling", 2019-03-15) and entirely get rid of
GIT_TEST_INSTALLED from perf-lib.sh (and aggregate.perl).
As noted in that change the dance we're doing with GIT_TEST_INSTALLED
perf-lib.sh isn't necessary, but there I was doing the most minimal
set of changes to quickly fix a regression.
But it's much simpler to never deal with the "GIT_TEST_INSTALLED" we
were setting in perf-lib.sh at all. Instead the run_dirs_helper() sets
the previously inferred $PERF_RESULTS_PREFIX directly.
Setting this at the callsite that's already best positioned to
exhaustively know about all the different cases we need to handle
where PERF_RESULTS_PREFIX isn't what we want already (the empty
string) makes the most sense. In one-off cases like:
./run ./p0000-perf-lib-sanity.sh
./p0000-perf-lib-sanity.sh
We'll just do the right thing because PERF_RESULTS_PREFIX will be
empty, and test-lib.sh takes care of finding where our git is.
Any refactoring of this code needs to change both the shell code and
the Perl code in aggregate.perl, because when running e.g.:
./run ../../ -- <test>
The "../../" path to a relative bindir needs to be munged to a
filename containing the results, and critically aggregate.perl does
not get passed the path to those aggregations, just "../..".
Let's fix cases where aggregate.perl would print e.g. ".." in its
report output for this, and "git" for "/home/avar/g/git", i.e. it
would always pick the last element. Now'll always print the full path
instead.
This also makes the code sturdier, e.g. you can feed "../.." to
"./run" and then an absolute path to the aggregate.perl script, as
long as the absolute path and "../.." resolved to the same directory
printing the aggregation will work.
Also simplify the "[_*]" on the RHS of "tr -c", we're trimming
everything to "_", so we don't need that.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Fix a really bad regression in 0baf78e7bc ("perf-lib.sh: rely on
test-lib.sh for --tee handling", 2019-03-15). Since that change all
runs of different <revisions> of git have used the git found in the
user's $PATH, e.g. /usr/bin/git instead of the <revision> we just
built and wanted to performance test.
The problem starts with GIT_TEST_INSTALLED not working like our
non-perf tests with the "run" script. I.e. you can't run performance
tests against a given installed git. Instead we expect to use it
ourselves to point GIT_TEST_INSTALLED to the <revision> we just built.
However, we had been relying on '$(cd "$GIT_TEST_INSTALLED" && pwd)'
to resolve that relative $GIT_TEST_INSTALLED to an absolute
path *before* test-lib.sh was loaded, in cases where it was
e.g. "build/<rev>/bin-wrappers" and we wanted "<abs_path>build/...".
This change post-dates another proposed solution by a few days[1], I
didn't notice that version when I initially wrote this. I'm doing the
most minimal thing to solve the regression here, a follow-up change
will move this result prefix selection logic entirely into the "run"
script.
This makes e.g. these cases all work:
./run . $PWD/../../ origin/master origin/next HEAD -- <tests>
As well as just a plain one-off:
./run <tests>
And, since we're passing down the new GIT_PERF_DIR_MYDIR_REL we make
sure the bug relating to aggregate.perl not finding our files as
described in 0baf78e7bc doesn't happen again.
What *doesn't* work is setting GIT_TEST_INSTALLED to a relative path,
this will subtly fail in test-lib.sh. This has always been the case
even before 0baf78e7bc, and as documented in t/README the
GIT_TEST_INSTALLED variable should be set to an absolute path (needs
to be set "to the bindir", which is always absolute), and the "perf"
framework expects to munge it itself.
Perhaps that should be dealt with in the future to allow manually
setting GIT_TEST_INSTALLED, but as a preceding commit showed the user
can just use the "run" script, which'll also pick the right output
directory for the test results as expected by aggregate.perl.
1. https://public-inbox.org/git/20190502222409.GA15631@sigill.intra.peff.net/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Remove the setting of the "environment" from the --codespeed output. I
don't think this is useful, and it helps with a later refactoring
where we GIT_TEST_INSTALLED stop munging/reading GIT_TEST_INSTALLED in
the perf tests in so many places.
This was added in 05eb1c37ed ("perf/aggregate: implement codespeed
JSON output", 2018-01-05), but since the "run" scripts uses
"GIT_TEST_INSTALLED" internally this was only ever useful for one-off
runs of a single revision as all the "environment" values would be
ones for whatever directory the "run" script ran last.
Let's instead fall back on the "uname -r" case, which is the sort of
thing the environment should be set to, not something that duplicates
other parts of the codpseed output. For setting the "environment" to
something custom the perf.repoName variable can be used. See
19cf57a92e ("perf/run: read GIT_PERF_REPO_NAME from perf.repoName",
2018-01-05).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Since 3c8f12c96c ("test-lib: reorder and include GIT-BUILD-OPTIONS a
lot earlier", 2012-06-24) the suggested advice of overriding
GIT_BUILD_DIR has not worked. We've printed a hard error like this
given e.g. GIT_BUILD_DIR=/home/avar/g/git:
/bin-wrappers/git is not executable; using GIT_EXEC_PATH
error: You haven't built things yet, have you?
Let's just suggest that the user run other gits via the "run"
script. That'll do the right thing for setting the path to the other
git, and running the "aggregate.perl" scripts afterwards will work.
As an aside, if setting GIT_BUILD_DIR had still worked, then the
MODERN_GIT feature/fix added in 1a0962dee5 ("t/perf: fix regression in
testing older versions of git", 2016-06-22) would have broke.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
We don't cover the partial clone feature at all in t/perf. Let's at
least run a few basic tests so that we'll notice any regressions.
We'll do a no-blob clone, and split it into two parts: the actual object
transfer, and the subsequent checkout (which will of course require
another transfer to get the blobs). That will help us more clearly
assess the performance of each.
There are obviously a lot more possibilities besides just a no-blob
partial clone, but this should serve as a canary that alerts us to any
generic slow-downs (and we can add more tests later for cases that
aren't exercised here).
There are a few non-ideal things here that make this not an entirely
accurate test, but are probably OK for our purposes:
1. We have to do some extra prep/cleanup work inside the timing tests,
since they impact the on-disk state and the perf harness may run
each one multiple times.
In practice this is probably OK, since these bits should be much
less expensive than the operations we are measuring.
2. The clone time is likely to be dominated by the server's object
enumeration. In the real world, a repo large enough to drive people
to partial clones is likely to have reachability bitmaps enabled.
And in the opposite direction, our object transfer is happening at
the speed of a local pipe, whereas in the real world it would
bottle-neck on the network.
So any percentage speedups should be taken with a grain of salt.
But hopefully any regressions will produce enough of an effect to
be noticeable.
This script also demonstrates the recent improvement from dfa33a298d
(clone: do faster object check for partial clones, 2019-04-19):
Test dfa33a298d^ dfa33a298d
-------------------------------------------------------------------------
5600.2: clone without blobs 18.41(22.72+1.09) 6.83(11.65+0.50) -62.9%
5600.3: checkout of result 1.82(3.24+0.26) 1.84(3.24+0.26) +1.1%
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Performance fix for "rev-list --parents -- pathspec".
* jk/revision-rewritten-parents-in-prio-queue:
revision: use a prio_queue to hold rewritten parents
Commit 05eb1c37ed (perf/aggregate: implement codespeed JSON output,
2018-01-05) added a dependency on the perl JSON module to show output
from aggregate.perl, but we only need it when the user asks for
--codespeed output. While the module is pretty common, it's not part of
the base system, and this dependency can get in the way of producing the
default human-readable output.
Let's bump the "use" down to a "require" in the code path that needs it,
which will be interpreted at run-time instead of compile-time. People
not using "--codespeed" won't even load the module, and anybody using it
should see the same results (including the same perl error if they don't
have it).
Note that this skips the importing step, so we'll have to fully qualify
our function call. We could accomplish the same thing in other ways.
E.g., calling JSON->import() ourselves, or wrapping "use JSON" in an
eval. Since there's only one such call, this seems like the
least-magical way of doing it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The p5302 script runs "index-pack --stdin" in each timing test. It does
two things to try to get good timings:
1. we do the repo creation in a separate (non-timed) setup test, so
that our timing is purely the index-pack run
2. we use a separate repo for each test; this is important because the
presence of existing objects in the repo influences the result
(because we'll end up doing collision checks against them)
But this forgets one thing: we generally run each timed test multiple
times to reduce the impact of noise. Which means that repeats of each
test after the first will be subject to the collision slowdown from
point 2, and we'll generally just end up taking the first time anyway.
Instead, let's create the repo in the test (effectively undoing point
1). That does add a constant amount of extra work to each iteration, but
it's quite small compared to the actual effects we're interested in
measuring.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch fixes a quadratic list insertion in rewrite_one() when
pathspec limiting is combined with --parents. What happens is something
like this:
1. We see that some commit X touches the path, so we try to rewrite
its parents.
2. rewrite_one() loops forever, rewriting parents, until it finds a
relevant parent (or hits the root and decides there are none). The
heavy lifting is done by process_parent(), which uses
try_to_simplify_commit() to drop parents.
3. process_parent() puts any intermediate parents into the
&revs->commits list, inserting by commit date as usual.
So if commit X is recent, and then there's a large chunk of history that
doesn't touch the path, we may add a lot of commits to &revs->commits.
And insertion by commit date is O(n) in the worst case, making the whole
thing quadratic.
We tried to deal with this long ago in fce87ae538 (Fix quadratic
performance in rewrite_one., 2008-07-12). In that scheme, we cache the
oldest commit in the list; if the new commit to be added is older, we
can start our linear traversal there. This often works well in practice
because parents are older than their descendants, and thus we tend to
add older and older commits as we traverse.
But this isn't guaranteed, and in fact there's a simple case where it is
not: merges. Imagine we look at the first parent of a merge and see a
very old commit (let's say 3 years old). And on the second parent, as we
go back 3 years in history, we might have many commits. That one
first-parent commit has polluted our oldest-commit cache; it will remain
the oldest while we traverse a huge chunk of history, during which we
have to fall back to the slow, linear method of adding to the list.
Naively, one might imagine that instead of caching the oldest commit,
we'd start at the last-added one. But that just makes some cases faster
while making others slower (and indeed, while it made a real-world test
case much faster, it does quite poorly in the perf test include here).
Fundamentally, these are just heuristics; our worst case is still
quadratic, and some cases will approach that.
Instead, let's use a data structure with better worst-case performance.
Swapping out revs->commits for something else would have repercussions
all over the code base, but we can take advantage of one fact: for the
rewrite_one() case, nobody actually needs to see those commits in
revs->commits until we've finished generating the whole list.
That leaves us with two obvious options:
1. We can generate the list _unordered_, which should be O(n), and
then sort it afterwards, which would be O(n log n) total. This is
"sort-after" below.
2. We can insert the commits into a separate data structure, like a
priority queue. This is "prio-queue" below.
I expected that sort-after would be the fastest (since it saves us the
extra step of copying the items into the linked list), but surprisingly
the prio-queue seems to be a bit faster.
Here are timings for the new p0001.6 for all three techniques across a
few repositories, as compared to master:
master cache-last sort-after prio-queue
--------------------------------------------------------------------------------------------
GIT_PERF_REPO=git.git
0.52(0.50+0.02) 0.53(0.51+0.02) +1.9% 0.37(0.33+0.03) -28.8% 0.37(0.32+0.04) -28.8%
GIT_PERF_REPO=linux.git
20.81(20.74+0.07) 20.31(20.24+0.07) -2.4% 0.94(0.86+0.07) -95.5% 0.91(0.82+0.09) -95.6%
GIT_PERF_REPO=llvm-project.git
83.67(83.57+0.09) 4.23(4.15+0.08) -94.9% 3.21(3.15+0.06) -96.2% 2.98(2.91+0.07) -96.4%
A few items to note:
- the cache-list tweak does improve the bad case for llvm-project.git
that started my digging into this problem. But it performs terribly
on linux.git, barely helping at all.
- the sort-after and prio-queue techniques work well. They approach
the timing for running without --parents at all, which is what you'd
expect (see below for more data).
- prio-queue just barely outperforms sort-after. As I said, I'm not
really sure why this is the case, but it is. You can see it even
more prominently in this real-world case on llvm-project.git:
git rev-list --parents 07ef786652e7 -- llvm/test/CodeGen/Generic/bswap.ll
where prio-queue routinely outperforms sort-after by about 7%. One
guess is that the prio-queue may just be more efficient because it
uses a compact array.
There are three new perf tests:
- "rev-list --parents" gives us a baseline for running with --parents.
This isn't sped up meaningfully here, because the bad case is
triggered only with simplification. But it's good to make sure we
don't screw it up (now, or in the future).
- "rev-list -- dummy" gives us a baseline for just traversing with
pathspec limiting. This gives a lower bound for the next test (and
it's also a good thing for us to be checking in general for
regressions, since we don't seem to have any existing tests).
- "rev-list --parents -- dummy" shows off the problem (and our fix)
Here are the timings for those three on llvm-project.git, before and
after the fix:
Test master prio-queue
------------------------------------------------------------------------------
0001.3: rev-list --parents 2.24(2.12+0.12) 2.22(2.11+0.11) -0.9%
0001.5: rev-list -- dummy 2.89(2.82+0.07) 2.92(2.89+0.03) +1.0%
0001.6: rev-list --parents -- dummy 83.67(83.57+0.09) 2.98(2.91+0.07) -96.4%
Changes in the first two are basically noise, and you can see we
approach our lower bound in the final one.
Note that we can't fully get rid of the list argument from
process_parents(). Other callers do have lists, and it would be hard to
convert them. They also don't seem to have this problem (probably
because they actually remove items from the list as they loop, meaning
it doesn't grow so large in the first place). So this basically just
drops the "cache_ptr" parameter (which was used only by the one caller
we're fixing here) and replaces it with a prio_queue. Callers are free
to use either data structure, depending on what they're prepared to
handle.
Reported-by: Björn Pettersson A <bjorn.a.pettersson@ericsson.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since its inception, the perf-lib.sh script has manually handled the
"--tee" option (and other options which imply it, like "--valgrind")
with a cut-and-pasted block from test-lib.sh. That block has grown stale
over the years, and has at least three problems:
1. It uses $SHELL to re-exec the script, whereas the version in
test-lib.sh learned to use $TEST_SHELL_PATH.
2. It does an ad-hoc search of the "$*" string, whereas test-lib.sh
learned to carefully parse the arguments left to right.
3. It never learned about --verbose-log (which also implies --tee),
so it would not trigger for that option.
This last one was especially annoying, because t/perf/run uses the
GIT_TEST_OPTS from your config.mak to run the perf scripts. So if you've
set, say, "-x --verbose-log" there, it will be passed as part of most
perf runs. And while this script doesn't recognize the option, the
test-lib.sh that we source _does_, and the behavior ends up being much
more annoying:
- as the comment at the top of the block says, we have to run this
tee code early, before we start munging variables (it says
GIT_BUILD_DIR, but the problematic variable is actually
GIT_TEST_INSTALLED).
- since we don't recognize --verbose-log, we don't trigger the block.
We go on to munge GIT_TEST_INSTALLED, converting it from a relative
to an absolute path.
- then we source test-lib.sh, which _does_ recognize --verbose-log. It
re-execs the script, which runs again. But this time with an
absolute version of GIT_TEST_INSTALLED.
- As a result, we copy the absolute version of GIT_TEST_INSTALLED into
perf_results_prefix. Instead of writing our results to the expected
"test-results/build_1234abcd.p1234-whatever.times", we instead write
them to "test-results/_full_path_to_repo_t_perf_build_1234...".
The aggregate.perl script doesn't expect this, and so it prints
"<missing>" for each result (even though it spent considerable time
running the tests!).
We can solve all of these in one blow by just deleting our custom
handling, and relying on the inclusion of test-lib.sh to handle --tee,
--verbose-log, etc.
There's one catch, though. We want to handle GIT_TEST_INSTALLED after
we've included test-lib.sh, since we want it un-munged in the re-exec'd
version of the script. But if we want to convert it from a relative
to an absolute path, we must do so before we load test-lib.sh, since it
will change our working directory. So we compute the absolute directory
first, store it away, then include test-lib.sh, and finally assign to
GIT_TEST_INSTALLED as appropriate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Enabling pack.writebitmaphashcache should always be a performance win.
It costs only 4 bytes per object on disk, and the timings in ae4f07fbcc
(pack-bitmap: implement optional name_hash cache, 2013-12-21) show it
improving fetch and partial-bitmap clone times by 40-50%.
The only reason we didn't enable it by default at the time is that early
versions of JGit's bitmap reader complained about the presence of
optional header bits it didn't understand. But that was changed in
JGit's d2fa3987a (Use bitcheck to check for presence of OPT_FULL option,
2013-10-30), which made it into JGit v3.5.0 in late 2014.
So let's turn this option on by default. It's backwards-compatible with
all versions of Git, and if you are also using JGit on the same
repository, you'd only run into problems using a version that's almost 5
years old.
We'll drop the manual setting from all of our test scripts, including
perf tests. This isn't strictly necessary, but it has two advantages:
1. If the hash-cache ever stops being enabled by default, our perf
regression tests will notice.
2. We can use the modified perf tests to show off the behavior of an
otherwise unconfigured repo, as shown below.
These are the results of a few of a perf tests against linux.git that
showed interesting results. You can see the expected speedup in 5310.4,
which was noted in ae4f07fbcc. Curiously, 5310.8 did not improve (and
actually got slower), despite seeing the opposite in ae4f07fbcc.
I don't have an explanation for that.
The tests from p5311 did not exist back then, but do show improvements
(a smaller pack due to better deltas, which we found in less time).
Test HEAD^ HEAD
-------------------------------------------------------------------------------------
5310.4: simulated fetch 7.39(22.70+0.25) 5.64(11.43+0.22) -23.7%
5310.8: clone (partial bitmap) 18.45(24.83+1.19) 19.94(28.40+1.36) +8.1%
5311.31: server (128 days) 0.41(1.13+0.05) 0.34(0.72+0.02) -17.1%
5311.32: size (128 days) 7.4M 7.0M -4.8%
5311.33: client (128 days) 1.33(1.49+0.06) 1.29(1.37+0.12) -3.0%
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Pruning generally has to traverse the whole commit graph in order to
see which objects are reachable. This is the exact problem that
reachability bitmaps were meant to solve, so let's use them (if they're
available, of course).
Here are timings on git.git:
Test HEAD^ HEAD
------------------------------------------------------------------------
5304.6: prune with bitmaps 3.65(3.56+0.09) 1.01(0.92+0.08) -72.3%
And on linux.git:
Test HEAD^ HEAD
--------------------------------------------------------------------------
5304.6: prune with bitmaps 35.05(34.79+0.23) 3.00(2.78+0.21) -91.4%
The tests show a pretty optimal case, as we'll have just repacked and
should have pretty good coverage of all refs with our bitmaps. But
that's actually pretty realistic: normally prune is run via "gc" right
after repacking.
A few notes on the implementation:
- the change is actually in reachable.c, so it would improve
reachability traversals by "reflog expire --stale-fix", as well.
Those aren't performed regularly, though (a normal "git gc" doesn't
use --stale-fix), so they're not really worth measuring. There's a
low chance of regressing that caller, since the use of bitmaps is
totally transparent from the caller's perspective.
- The bitmap case could actually get away without creating a "struct
object", and instead the caller could just look up each object id in
the bitmap result. However, this would be a marginal improvement in
runtime, and it would make the callers much more complicated. They'd
have to handle both the bitmap and non-bitmap cases separately, and
in the case of git-prune, we'd also have to tweak prune_shallow(),
which relies on our SEEN flags.
- Because we do create real object structs, we go through a few
contortions to create ones of the right type. This isn't strictly
necessary (lookup_unknown_object() would suffice), but it's more
memory efficient to use the correct types, since we already know
them.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The general strategy of "git prune" is to do a full reachability walk,
then for each loose object see if we found it in our walk. But if we
don't have any loose objects, we don't need to do the expensive walk in
the first place.
This patch postpones that walk until the first time we need to see its
results.
Note that this is really a specific case of a more general optimization,
which is that we could traverse only far enough to find the object under
consideration (i.e., stop the traversal when we find it, then pick up
again when asked about the next object, etc). That could save us in some
instances from having to do a full walk. But it's actually a bit tricky
to do with our traversal code, and you'd need to do a full walk anyway
if you have even a single unreachable object (which you generally do, if
any objects are actually left after running git-repack).
So in practice this lazy-load of the full walk catches one easy but
common case (i.e., you've just repacked via git-gc, and there's nothing
unreachable).
The perf script is fairly contrived, but it does show off the
improvement:
Test HEAD^ HEAD
-------------------------------------------------------------------------
5304.4: prune with no objects 3.66(3.60+0.05) 0.00(0.00+0.00) -100.0%
and would let us know if we accidentally regress this optimization.
Note also that we need to take special care with prune_shallow(), which
relies on us having performed the traversal. So this optimization can
only kick in for a non-shallow repository. Since this is easy to get
wrong and is not covered by existing tests, let's add an extra test to
t5304 that covers this case explicitly.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some of the functions in our test library check that they were invoked
properly with conditions like this:
test "$#" = 2 ||
error "bug in the test script: not 2 parameters to test-expect-success"
If this particular condition is triggered, then 'error' will abort the
whole test script with a bold red error message [1] right away.
However, under certain circumstances the test script will be aborted
completely silently, namely if:
- a similar condition in a test helper function like
'test_line_count' is triggered,
- which is invoked from the test script's "main" shell [2],
- and the test script is run manually (i.e. './t1234-foo.sh' as
opposed to 'make t1234-foo.sh' or 'make test') [3]
- and without the '--verbose' option,
because the error message is printed from within 'test_eval_', where
standard output is redirected either to /dev/null or to a log file.
The only indication that something is wrong is that not all tests in
the script are executed and at the end of the test script's output
there is no "# passed all N tests" message, which are subtle and can
easily go unnoticed, as I had to experience myself.
Send these "bug in the test script" error messages directly to the
test scripts standard error and thus to the terminal, so those bugs
will be much harder to overlook. Instead of updating all ~20 such
'error' calls with a redirection, let's add a BUG() function to
'test-lib.sh', wrapping an 'error' call with the proper redirection
and also including the common prefix of those error messages, and
convert all those call sites [4] to use this new BUG() function
instead.
[1] That particular error message from 'test_expect_success' is
printed in color only when running with or without '--verbose';
with '--tee' or '--verbose-log' the error is printed without
color, but it is printed to the terminal nonetheless.
[2] If such a condition is triggered in a subshell of a test, then
'error' won't be able to abort the whole test script, but only the
subshell, which in turn causes the test to fail in the usual way,
indicating loudly and clearly that something is wrong.
[3] Well, 'error' aborts the test script the same way when run
manually or by 'make' or 'prove', but both 'make' and 'prove' pay
attention to the test script's exit status, and even a silently
aborted test script would then trigger those tools' usual
noticable error messages.
[4] Strictly speaking, not all those 'error' calls need that
redirection to send their output to the terminal, see e.g.
'test_expect_success' in the opening example, but I think it's
better to be consistent.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
p3400 makes a copy of the current repository to test git-rebase
performance, and creates new branches in the copy with `git checkout
-b'. If the original repository has branches with the same name as the
script is trying to create, this operation will fail.
This replaces these calls by `git checkout -B' to force the creation and
update of these branches.
Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update fsck.skipList implementation and documentation.
* ab/fsck-skiplist:
fsck: support comments & empty lines in skipList
fsck: use oidset instead of oid_array for skipList
fsck: use strbuf_getline() to read skiplist file
fsck: add a performance test for skipList
fsck: add a performance test
fsck: document that skipList input must be unabbreviated
fsck: document and test commented & empty line skipList input
fsck: document and test sorted skipList input
fsck tests: add a test for no skipList input
fsck tests: setup of bogus commit object
Create a performance test to see how the skipList implementation
performs. First we setup N bad commits, then we see how progressively
working our way up to 0..N in increments of 10x does. I.e. the
needle(s) in the haystack get progressively more numerous.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a plain performance test for "fsck". This test will not be used to
/ referred to in any upcoming commit of mine in this series, but
having a simple test for fsck performance is valuable, so let's add it
while we're at it.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A server with bitmapped packs can serve a clone very
quickly. However, fetches are not necessarily made any
faster, because we spend a lot less time in object traversal
(which is what bitmaps help with) and more time finding
deltas (because we may have to throw out on-disk deltas if
the client does not have the base).
As a first step to making this faster, this patch introduces
a new perf script to measure fetches into a repo of various
ages from a fully-bitmapped server.
We separately measure the work done by the server (in
pack-objects) and that done by the client (in index-pack).
Furthermore, we measure the size of the resulting pack.
Breaking it down like this (instead of just doing a regular
"git fetch") lets us see how much each side benefits from
any changes. And since we know the pack size, if we estimate
the network speed, then one could calculate a complete
wall-clock time for the operation (though the script does
not do this automatically).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The main objective of scripts in the perf framework is to
run "test_perf", which measures the time it takes to run
some operation. However, it can also be interesting to see
the change in the output size of certain operations.
This patch introduces test_size, which records a single
numeric output from the test and shows it in the aggregated
output (with pretty printing and relative size comparison).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will let us reuse the code when we add new values to
aggregate besides times.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
About half of test_perf() is boilerplate preparing to run
_any_ test, and the other half is specifically running a
timing test. Let's split it into two functions, so that we
can reuse the boilerplate in future commits.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When bisecting a performance regression using a config file,
`./bisect_regression --config my_perf.conf` for example, the
config file can contain Codespeed configuration which would
instruct the 'aggregate.perl' script called by the 'run'
script to output results in the Codespeed format and maybe
to try to send this output to a Codespeed server.
This is unfortunate because the 'bisect_run_script' relies
on the regular output from 'aggregate.perl' to mesure
performance, so let's disable Codespeed output and sending
results to a Codespeed server.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When passing an option '--foo' that it does not recognize, the
aggregate.perl script should die with an helpful error message
like:
Unknown option: foo
./aggregate.perl [options] [--] [<dir_or_rev>...] [--] \
[<test_script>...] >
Options:
--codespeed * Format output for Codespeed
--reponame <str> * Send given reponame to codespeed
--sort-by <str> * Sort output (only "regression" \
criteria is supported)
rather than:
fatal: Needed a single revision
rev-parse --verify --foo: command returned error: 128
To implement that let's use Getopt::Long for option parsing
instead of the current manual and sloppy parsing. This should
save some code and make option parsing simpler, tighter and
safer.
This will avoid something like 'foo--sort-by=regression' to
be handled as if '--sort-by=regression' had been used, for
example.
As Getopt::Long eats '--' at the end of options, this changes
a bit the way '--' is handled as we can now have '--' both
after the options and before the scripts.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new bisect_regression script can be used to automatically bisect
performance regressions. It will pass the new bisect_run_script to
`git bisect run`.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new option makes it possible to run perf tests as defined
in only one subsection of a config file.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Small test-helper programs have been consolidated into a single
binary.
* nd/combined-test-helper: (36 commits)
t/helper: merge test-write-cache into test-tool
t/helper: merge test-wildmatch into test-tool
t/helper: merge test-urlmatch-normalization into test-tool
t/helper: merge test-subprocess into test-tool
t/helper: merge test-submodule-config into test-tool
t/helper: merge test-string-list into test-tool
t/helper: merge test-strcmp-offset into test-tool
t/helper: merge test-sigchain into test-tool
t/helper: merge test-sha1-array into test-tool
t/helper: merge test-scrap-cache-tree into test-tool
t/helper: merge test-run-command into test-tool
t/helper: merge test-revision-walking into test-tool
t/helper: merge test-regex into test-tool
t/helper: merge test-ref-store into test-tool
t/helper: merge test-read-cache into test-tool
t/helper: merge test-prio-queue into test-tool
t/helper: merge test-path-utils into test-tool
t/helper: merge test-online-cpus into test-tool
t/helper: merge test-mktemp into test-tool
t/helper: merge (unused) test-mergesort into test-tool
...
One of the most interesting thing one can be interested in when
looking at performance test results is possible performance
regressions.
This new option makes it easy to spot such possible regressions.
This new option is named '--sort-by=regression' to make it
possible and easy to add other ways to sort the results, like for
example '--sort-by=utime'.
If we would like to sort according to how much the stime regressed
we could also add a new option called '--sort-by=regression:stime'.
Then '--sort-by=regression' could become a synonym for
'--sort-by=regression:rtime'.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new helper function will be reused in a subsequent
commit.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
9ba95ed23c (perf/run: update get_var_from_env_or_config() for
subsections) stopped setting a default value for GIT_PERF_REPEAT_COUNT
if no perf config file is present, because get_var_from_env_or_config
returns early in that case.
Fix it by setting the default value after calling this function. Its
fifth parameter is not used for any other variable, so remove the
associated code.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The build procedure for perl/ part has been greatly simplified by
weaning ourselves off of MakeMaker.
* ab/simplify-perl-makefile:
perl: treat PERLLIB_EXTRA as an extra path again
perl: avoid *.pmc and fix Error.pm further
Makefile: replace perl/Makefile.PL with simple make rules
It is much easier to diff the output against a previous
one when the fields are sorted.
Helped-by: Philip Oakley <philipoakley@iee.org>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes it easier to use the aggregate script
on the command line when one wants to get the
"environment" fields set in the codespeed output.
Previously setting GIT_REPO_NAME was needed
for this purpose.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes it easier to use the aggregate script
on the command line, to get results from
subsections.
Previously setting GIT_PERF_SUBSECTION was needed
for this purpose.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"perf" test output can be sent to codespeed server.
* cc/codespeed:
perf/run: read GIT_PERF_REPO_NAME from perf.repoName
perf/run: learn to send output to codespeed server
perf/run: learn about perf.codespeedOutput
perf/run: add conf_opts argument to get_var_from_env_or_config()
perf/aggregate: implement codespeed JSON output
perf/aggregate: refactor printing results
perf/aggregate: fix checking ENV{GIT_PERF_SUBSECTION}
The GIT_PERF_REPO_NAME env variable is used in
the `aggregate.perl` script to set the 'environment'
field in the JSON Codespeed output.
Let's make it easy to set this variable by setting it
in a config file.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's make it possible to set in a config file the URL of
a codespeed server. And then let's make the `run` script
send the perf test results to this URL at the end of the
tests.
This should make is possible to easily automate the process
of running perf tests and having their results available in
Codespeed.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's make it possible to set in a config file the output
format (regular or codespeed) of the perf tests.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's make it possible to use `git config` type specifiers like
`--int` or `--bool`, so that config values are converted to the
canonical form and easier to use.
This additional argument is now the fourth argument of
get_var_from_env_or_config() instead of the fifth because we
want the default value argument to be unset if it is not
passed, and this is simpler if it is the last argument.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Codespeed (https://github.com/tobami/codespeed/) is an open source
project that can be used to track how some software performs over
time. It stores performance test results in a database and can show
nice graphs and charts on a web interface.
As it can be interesting to use Codespeed to see how Git performance
evolves over time and releases, let's implement a Codespeed output
in "perf/aggregate.perl".
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we want to implement another kind of output than
the current output for the perf test results, let's
refactor the existing code that outputs the results
in its own print_default_results() function.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The way we check ENV{GIT_PERF_SUBSECTION} could trigger
comparison between undef and "" that may be flagged by
use of strict & warnings. Let's fix that.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ever since 5b594f457a ("Threaded grep", 2010-01-25) the number of
threads git-grep uses under PTHREADS has been hardcoded to 8, but
there's no performance test to check whether this is an optimal
setting.
Amend the existing tests for the grep engines to support a mode where
this can be tested, e.g.:
GIT_PERF_GREP_THREADS='1 8 16' GIT_PERF_LARGE_REPO=~/g/linux ./run p782*
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The return code of command -v with a non-existing command is 1 in bash
and 127 in dash. Use that return code directly to allow the script to
work with dash and without watchman (e.g. on Debian).
While at it stop redirecting the output. stderr is redirected to
/dev/null by test_lazy_prereq already, and stdout can actually be
useful -- the path of the found watchman executable is sent there, but
it's shown only if the script was run with --verbose.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Acked-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the perl/Makefile.PL and the fallback perl/Makefile used under
NO_PERL_MAKEMAKER=NoThanks with a much simpler implementation heavily
inspired by how the i18n infrastructure's build process works[1].
The reason for having the Makefile.PL in the first place is that it
was initially[2] building a perl C binding to interface with libgit,
this functionality, that was removed[3] before Git.pm ever made it to
the master branch.
We've since since started maintaining a fallback perl/Makefile, as
MakeMaker wouldn't work on some platforms[4]. That's just the tip of
the iceberg. We have the PM.stamp hack in the top-level Makefile[5] to
detect whether we need to regenerate the perl/perl.mak, which I fixed
just recently to deal with issues like the perl version changing from
under us[6].
There is absolutely no reason for why this needs to be so complex
anymore. All we're getting out of this elaborate Rube Goldberg machine
was copying perl/* to perl/blib/* as we do a string-replacement on
the *.pm files to hardcode @@LOCALEDIR@@ in the source, as well as
pod2man-ing Git.pm & friends.
So replace the whole thing with something that's pretty much a copy of
how we generate po/build/**.mo from po/*.po, just with a small sed(1)
command instead of msgfmt. As that's being done rename the files
from *.pm to *.pmc just to indicate that they're generated (see
"perldoc -f require").
While I'm at it, change the fallback for Error.pm from being something
where we'll ship our own Error.pm if one doesn't exist at build time
to one where we just use a Git::Error wrapper that'll always prefer
the system-wide Error.pm, only falling back to our own copy if it
really doesn't exist at runtime. It's now shipped as
Git::FromCPAN::Error, making it easy to add other modules to
Git::FromCPAN::* in the future if that's needed.
Functional changes:
* This will not always install into perl's idea of its global
"installsitelib". This only potentially matters for packagers that
need to expose Git.pm for non-git use, and as explained in the
INSTALL file there's a trivial workaround.
* The scripts themselves will 'use lib' the target directory, but if
INSTLIBDIR is set it overrides it. It doesn't have to be this way,
it could be set in addition to INSTLIBDIR, but my reading of [7] is
that this is the desired behavior.
* We don't build man pages for all of the perl modules as we used to,
only Git(3pm). As discussed on-list[8] that we were building
installed manpages for purely internal APIs like Git::I18N or
private-Error.pm was always a bug anyway, and all the Git::SVN::*
ones say they're internal APIs.
There are apparently external users of Git.pm, but I don't expect
there to be any of the others.
As a side-effect of these general changes the perl documentation
now only installed by install-{doc,man}, not a mere "install" as
before.
1. 5e9637c629 ("i18n: add infrastructure for translating Git with
gettext", 2011-11-18)
2. b1edc53d06 ("Introduce Git.pm (v4)", 2006-06-24)
3. 18b0fc1ce1 ("Git.pm: Kill Git.xs for now", 2006-09-23)
4. f848718a69 ("Make perl/ build procedure ActiveState friendly.",
2006-12-04)
5. ee9be06770 ("perl: detect new files in MakeMaker builds",
2012-07-27)
6. c59c4939c2 ("perl: regenerate perl.mak if perl -V changes",
2017-03-29)
7. 0386dd37b1 ("Makefile: add PERLLIB_EXTRA variable that adds to
default perl path", 2013-11-15)
8. 87bmjjv1pu.fsf@evledraar.booking.com ("Re: [PATCH] Makefile:
replace perl/Makefile.PL with simple make rules"
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Internaly we use 0{40} as a placeholder object name to signal the
codepath that there is no such object (e.g. the fast-forward check
while "git fetch" stores a new remote-tracking ref says "we know
there is no 'old' thing pointed at by the ref, as we are creating
it anew" by passing 0{40} for the 'old' side), and expect that a
codepath to locate an in-core object to return NULL as a sign that
the object does not exist. A look-up for an object that does not
exist however is quite costly with a repository with large number
of packfiles. This access pattern has been optimized.
* jk/fewer-pack-rescan:
sha1_file: fast-path null sha1 as a missing object
everything_local: use "quick" object existence check
p5551: add a script to test fetch pack-dir rescans
t/perf/lib-pack: use fast-import checkpoint to create packs
p5550: factor out nonsense-pack creation
* cc/perf-run-config:
perf: store subsection results in "test-results/$GIT_PERF_SUBSECTION/"
perf/run: show name of rev being built
perf/run: add run_subsection()
perf/run: update get_var_from_env_or_config() for subsections
perf/run: add get_subsections()
perf/run: add calls to get_var_from_env_or_config()
perf/run: add GIT_PERF_DIRS_OR_REVS
perf/run: add get_var_from_env_or_config()
perf/run: add '--config' option to the 'run' script
Replace use of strbuf_addf() with strbuf_add() when enumerating
loose objects in for_each_file_in_obj_subdir(). Since we already
check the length and hex-values of the string before consuming
the path, we can prevent extra computation by using the lower-
level method.
One consumer of for_each_file_in_obj_subdir() is the abbreviation
code. OID abbreviations use a cached list of loose objects (per
object subdirectory) to make repeated queries fast, but there is
significant cache load time when there are many loose objects.
Most repositories do not have many loose objects before repacking,
but in the GVFS case the repos can grow to have millions of loose
objects. Profiling 'git log' performance in GitForWindows on a
GVFS-enabled repo with ~2.5 million loose objects revealed 12% of
the CPU time was spent in strbuf_addf().
Add a new performance test to p4211-line-log.sh that is more
sensitive to this cache-loading. By limiting to 1000 commits, we
more closely resemble user wait time when reading history into a
pager.
For a copy of the Linux repo with two ~512 MB packfiles and ~572K
loose objects, running 'git log --oneline --parents --raw -1000'
had the following performance:
HEAD~1 HEAD
----------------------------------------
7.70(7.15+0.54) 7.44(7.09+0.29) -3.4%
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We learned to talk to watchman to speed up "git status" and other
operations that need to see which paths have been modified.
* bp/fsmonitor:
fsmonitor: preserve utf8 filenames in fsmonitor-watchman log
fsmonitor: read entirety of watchman output
fsmonitor: MINGW support for watchman integration
fsmonitor: add a performance test
fsmonitor: add a sample integration script for Watchman
fsmonitor: add test cases for fsmonitor extension
split-index: disable the fsmonitor extension when running the split index test
fsmonitor: add a test tool to dump the index extension
update-index: add fsmonitor support to update-index
ls-files: Add support in ls-files to display the fsmonitor valid bit
fsmonitor: add documentation for the fsmonitor extension.
fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
update-index: add a new --force-write-index option
preload-index: add override to enable testing preload-index
bswap: add 64 bit endianness helper get_be64
Since fetch often deals with object-ids we don't have (yet),
it's an easy mistake for it to use a function like
parse_object() that gives the correct result (e.g., NULL)
but does so very slowly (because after failing to find the
object, we re-scan the pack directory looking for new
packs).
The regular test suite won't catch this because the end
result is correct, but we would want to know about
performance regressions, too. Let's add a test to the
regression suite.
Note that this uses a synthetic repository that has a large
number of packs. That's not ideal, as it means we're not
testing what "normal" users see (in fact, some of these
problems have existed for ages without anybody noticing
simply because a rescan on a normal repository just isn't
that expensive).
So what we're really looking for here is the spike you'd
notice in a pathological case (a lot of unknown objects
coming into a repo with a lot of packs). If that's fast,
then the normal cases should be, too.
Note that the test also makes liberal use of $MODERN_GIT for
setup; some of these regressions go back a ways, and we
should be able to use it to find the problems there.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We currently use fast-import only to create a large number
of objects, and then run O(n) invocations of pack-objects to
turn them into packs.
We can do this faster by just asking fast-import to
checkpoint and create a pack for each (after telling it
not to turn loose tiny packs).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a function to create a bunch of irrelevant packs to
measure the expense of reprepare_packed_git(). Let's make
that available to other perf scripts.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new perf test for testing the performance of log while computing
OID abbreviations. Using --oneline --raw and --parents options maximizes
the number of OIDs to abbreviate while still spending some time computing
diffs.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a test utility (test-drop-caches) that flushes all changes to disk
then drops file system cache on Windows, Linux, and OSX.
Add a perf test (p7519-fsmonitor.sh) for fsmonitor.
By default, the performance test will utilize the Watchman file system
monitor if it is installed. If Watchman is not installed, it will use a
dummy integration script that does not report any new or modified files.
The dummy script has very little overhead which provides optimistic results.
The performance test will also use the untracked cache feature if it is
available as fsmonitor uses it to speed up scanning for untracked files.
There are 4 environment variables that can be used to alter the default
behavior of the performance test:
GIT_PERF_7519_UNTRACKED_CACHE: used to configure core.untrackedCache
GIT_PERF_7519_SPLIT_INDEX: used to configure core.splitIndex
GIT_PERF_7519_FSMONITOR: used to configure core.fsmonitor
GIT_PERF_7519_DROP_CACHE: if set, the OS caches are dropped between tests
The big win for using fsmonitor is the elimination of the need to scan the
working directory looking for changed and untracked files. If the file
information is all cached in RAM, the benefits are reduced.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When tests are run for a subsection defined in a config file, it is
better if the results for the current subsection are not overwritting
the results of a previous subsection.
So let's store the results for a subsection in a subdirectory of
"test-results/" with the subsection name.
The aggregate.perl, when it is run for a subsection, should then
aggregate the results found in "test-results/$GIT_PERF_SUBSECTION/".
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is nice for the user to not just show the sha1 of the
current revision being built but also the actual name of
this revision.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's actually use the subsections we find in the config file
to run the perf tests separately for each subsection.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we will set some config options in subsections, let's
teach get_var_from_env_or_config() to get the config options
from the subsections if they are set there.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function makes it possible to find subsections, so that
we will be able to run different tests for different subsections
in a later commit.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These calls make it possible to have the make command or the
make options in a config file, instead of in environment
variables.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This environment variable can be set to some revisions or
directories whose Git versions should be tested, in addition
to the revisions or directories passed as arguments to the
'run' script.
This enables a "perf.dirsOrRevs" configuration variable to
be used to set revisions or directories whose Git versions
should be tested.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add get_var_from_env_or_config() to easily set variables
from a config file if they are defined there and not already set.
This can also set them to a default value if one is provided.
As an example, use this function to set GIT_PERF_REPEAT_COUNT
from the perf.repeatCount config option or from the default
value.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is error prone and tiring to use many long environment
variables to give parameters to the 'run' script.
Let's make it easy to store some parameters in a config
file and to pass them to the run script.
The GIT_PERF_CONFIG_FILE variable will be set to the
argument of the '--config' option. This variable is not
used yet. It will be used in a following commit.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A performance test for writing the index to be able to
determine if changes to allocating ondisk structure help.
Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Optimize "what are the object names already taken in an alternate
object database?" query that is used to derive the length of prefix
an object name is uniquely abbreviated to.
* rs/sha1-name-readdir-optim:
sha1_file: guard against invalid loose subdirectory numbers
sha1_file: let for_each_file_in_obj_subdir() handle subdir names
p4205: add perf test script for pretty log formats
sha1_name: cache readdir(3) results in find_short_object_filename()
Add simple performance tests for expanded log format placeholders.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
perf-test update.
* jh/memihash-opt:
p0004: don't error out if test repo is too small
p0004: don't abort if multi-threaded is too slow
p0004: use test_perf
p0004: avoid using pipes
p0004: simplify calls of test-lazy-init-name-hash
When the tested repo has an index.lock file it should be removed. This
file may be present if e.g. git-status previously crashed in that
repo, and it will make a lot of git commands fail. Let's try harder
and remove the lock.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The internal implementation of "git grep" has seen some clean-up.
* ab/grep-preparatory-cleanup: (31 commits)
grep: assert that threading is enabled when calling grep_{lock,unlock}
grep: given --threads with NO_PTHREADS=YesPlease, warn
pack-objects: fix buggy warning about threads
pack-objects & index-pack: add test for --threads warning
test-lib: add a PTHREADS prerequisite
grep: move is_fixed() earlier to avoid forward declaration
grep: change internal *pcre* variable & function names to be *pcre1*
grep: change the internal PCRE macro names to be PCRE1
grep: factor test for \0 in grep patterns into a function
grep: remove redundant regflags assignments
grep: catch a missing enum in switch statement
perf: add a comparison test of log --grep regex engines with -F
perf: add a comparison test of log --grep regex engines
perf: add a comparison test of grep regex engines with -F
perf: add a comparison test of grep regex engines
perf: emit progress output when unpacking & building
perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do
grep: add tests to fix blind spots with \0 patterns
grep: prepare for testing binary regexes containing rx metacharacters
grep: add a test helper function for less verbose -f \0 tests
...
perf-test update.
* jh/memihash-opt:
p0004: don't error out if test repo is too small
p0004: don't abort if multi-threaded is too slow
p0004: use test_perf
p0004: avoid using pipes
p0004: simplify calls of test-lazy-init-name-hash
Add perf-test for wildmatch.
* ab/perf-wildmatch:
perf: add test showing exponential growth in path globbing
perf: add function to setup a fresh test repo
Add a performance comparison test of grep regex engines given fixed
strings.
The current logic in compile_regexp() ignores the engine parameter and
uses kwset() to search for these, so this test shows no difference
between engines right now:
$ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux ./run p7821-grep-engines-fixed.sh
[...]
Test this tree
------------------------------------------------
7821.1: fixed grep int 0.56(1.67+0.68)
7821.2: basic grep int 0.57(1.70+0.57)
7821.3: extended grep int 0.59(1.76+0.51)
7821.4: perl grep int 1.08(1.71+0.55)
7821.6: fixed grep uncommon 0.23(0.55+0.50)
7821.7: basic grep uncommon 0.24(0.55+0.50)
7821.8: extended grep uncommon 0.26(0.55+0.52)
7821.9: perl grep uncommon 0.24(0.58+0.47)
7821.11: fixed grep æ 0.36(1.30+0.42)
7821.12: basic grep æ 0.36(1.32+0.40)
7821.13: extended grep æ 0.38(1.30+0.42)
7821.14: perl grep æ 0.35(1.24+0.48)
Only when run with -i via GIT_PERF_7821_GREP_OPTS=' -i' do we avoid
avoid going through the same kwset.[ch] codepath, see the "Even when
-F..." comment in grep.c. This only kicks for the non-ASCII case:
$ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_7821_GREP_OPTS=' -i' ./run p7821-grep-engines-fixed.sh
[...]
Test this tree
---------------------------------------------------
7821.1: fixed grep -i int 0.62(2.10+0.57)
7821.2: basic grep -i int 0.68(1.90+0.61)
7821.3: extended grep -i int 0.78(1.94+0.57)
7821.4: perl grep -i int 0.98(1.78+0.74)
7821.6: fixed grep -i uncommon 0.24(0.44+0.64)
7821.7: basic grep -i uncommon 0.25(0.56+0.54)
7821.8: extended grep -i uncommon 0.27(0.62+0.45)
7821.9: perl grep -i uncommon 0.24(0.59+0.49)
7821.11: fixed grep -i æ 0.30(0.96+0.39)
7821.12: basic grep -i æ 0.27(0.92+0.44)
7821.13: extended grep -i æ 0.28(0.90+0.46)
7821.14: perl grep -i æ 0.28(0.74+0.49)
I'm planning to change how fixed-string searching happens. This test
gives a baseline for comparing performance before & after any such
change.
See commit ("perf: add a comparison test of grep regex engines",
2017-04-19) for details on the machine the above test run was executed
on.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a very basic performance comparison test comparing the POSIX
basic, extended and perl engines.
In theory the "basic" and "extended" engines should be implemented
using the same underlying code with a slightly different pattern
parser, but some implementations may not do this. Jump through some
slight hoops to test both, which is worthwhile since "basic" is the
default.
Running this on an i7 3.4GHz Linux 4.9.0-2 Debian testing against a
checkout of linux.git & latest upstream PCRE, both PCRE and git
compiled with -O3 using gcc 7.1.1:
$ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux ./run p7820-grep-engines.sh
[...]
Test this tree
---------------------------------------------------------------
7820.1: basic grep 'how.to' 0.34(1.24+0.53)
7820.2: extended grep 'how.to' 0.33(1.23+0.45)
7820.3: perl grep 'how.to' 0.31(1.05+0.56)
7820.5: basic grep '^how to' 0.32(1.24+0.42)
7820.6: extended grep '^how to' 0.33(1.20+0.44)
7820.7: perl grep '^how to' 0.57(2.67+0.42)
7820.9: basic grep '[how] to' 0.51(2.16+0.45)
7820.10: extended grep '[how] to' 0.49(2.20+0.43)
7820.11: perl grep '[how] to' 0.56(2.60+0.43)
7820.13: basic grep '\(e.t[^ ]*\|v.ry\) rare' 0.66(3.25+0.40)
7820.14: extended grep '(e.t[^ ]*|v.ry) rare' 0.65(3.19+0.46)
7820.15: perl grep '(e.t[^ ]*|v.ry) rare' 1.05(5.74+0.34)
7820.17: basic grep 'm\(ú\|u\)lt.b\(æ\|y\)te' 0.34(1.28+0.47)
7820.18: extended grep 'm(ú|u)lt.b(æ|y)te' 0.34(1.38+0.38)
7820.19: perl grep 'm(ú|u)lt.b(æ|y)te' 0.39(1.56+0.44)
Options can also be passed to git-grep via the GIT_PERF_7820_GREP_OPTS
environment variable. There are various modes such as "-v" that have
very different performance profiles, but handling the combinatorial
explosion of testing all those options would make this script much
more complex and harder to maintain. Instead just add the ability to
do one-shot runs with arbitrary options, e.g.:
$ GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_7820_GREP_OPTS=" -i" ./run p7820-grep-engines.sh
[...]
Test this tree
------------------------------------------------------------------
7820.1: basic grep -i 'how.to' 0.49(1.72+0.38)
7820.2: extended grep -i 'how.to' 0.46(1.64+0.42)
7820.3: perl grep -i 'how.to' 0.44(1.45+0.45)
7820.5: basic grep -i '^how to' 0.47(1.76+0.38)
7820.6: extended grep -i '^how to' 0.47(1.70+0.42)
7820.7: perl grep -i '^how to' 0.65(2.72+0.37)
7820.9: basic grep -i '[how] to' 0.86(3.64+0.42)
7820.10: extended grep -i '[how] to' 0.84(3.62+0.46)
7820.11: perl grep -i '[how] to' 0.73(3.06+0.39)
7820.13: basic grep -i '\(e.t[^ ]*\|v.ry\) rare' 1.63(8.13+0.36)
7820.14: extended grep -i '(e.t[^ ]*|v.ry) rare' 1.64(8.01+0.44)
7820.15: perl grep -i '(e.t[^ ]*|v.ry) rare' 1.44(6.88+0.44)
7820.17: basic grep -i 'm\(ú\|u\)lt.b\(æ\|y\)te' 0.66(2.67+0.44)
7820.18: extended grep -i 'm(ú|u)lt.b(æ|y)te' 0.66(2.67+0.43)
7820.19: perl grep -i 'm(ú|u)lt.b(æ|y)te' 0.59(2.31+0.37)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Amend the t/perf/run output so that in addition to the "Running N
tests" heading currently being emitted, it also emits "Unpacking $rev"
and "Building $rev" when setting up the build/$rev directory & when
building it, respectively.
This makes it easier to see what's going on and what revision is being
tested as the output scrolls by.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a git GIT_PERF_MAKE_COMMAND variable to compliment the existing
GIT_PERF_MAKE_OPTS facility. This allows specifying an arbitrary shell
command to execute instead of 'make'.
This is useful e.g. in cases where the name, semantics or defaults of
a Makefile flag have changed over time. It can even be used to change
the contents of the tree, useful for monkeypatching ancient versions
of git to get them to build.
This opens Pandora's box in some ways, it's now possible to
"jailbreak" the perf environment and e.g. modify the source tree via
this arbitrary instead of just issuing a custom "make" command, such a
command has to be re-entrant in the sense that subsequent perf runs
will re-use the possibly modified tree.
It would be pointless to try to mitigate or work around that caveat in
a tool purely aimed at Git developers, so this change makes no attempt
to do so.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Repositories with less than 4000 entries are always handled using a
single thread, causing test-lazy-init-name-hash --multi to error out.
Don't abort the whole test script in that case, but simply skip the
multi-threaded performance check. We can still use it to compare the
single-threaded speed of different versions in that case.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Acked-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the single-threaded variant beats the multi-threaded one then we may
have a performance bug, but that doesn't justify aborting the test.
Drop that check; we can compare the results for --single and --multi
using the actual performance tests.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Acked-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The perf test suite (more specifically: t/perf/aggregate.perl) requires
each test script to write test results into a file, otherwise it aborts
when aggregating. Add actual performance tests with test_perf to allow
p0004 to be run together with other perf scripts.
Calibrate the value for the parameter --count based on the size of the
test repository, in order to get meaningful results with smaller repos
yet still be able to finish the script against huge ones without having
to wait for hours.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Acked-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The return code of commands on the producing end of a pipe is ignored.
Evaluate the outcome of test-lazy-init-name-hash by calling sort
separately.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Acked-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test library puts helpers into $PATH, so we can simply call them
without specifying their location.
The suffix $X is also not necessary because .exe files on Windows can be
started without specifying their extension, and on other platforms it's
empty anyway.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Acked-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a test showing that runtimes of the wildmatch() function used for
globbing in git grow exponentially in the face of some pathological
globs.
This issue affects both globs matching filenames via e.g. ls-files,
and globs matching refnames via e.g. for-each-ref.
As noted in the test description this is a test to see whether Git
suffers from the issue noted in an article Russ Cox posted today about
common bugs in various glob implementations:
https://research.swtch.com/glob
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a function to setup a fresh test repo via 'git init' to compliment
the existing functions to copy over a normal & large repo.
Some performance tests don't need any existing repository data at all
to be significant, e.g. tests which stress glob matches against single
pathological revisions or files, which I'm about to add in a
subsequent commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rebasing onto many changes is interesting, but it's also
interesting to see what happens when rebasing many changes.
And while at it, let's also look at the impact of using a
split index.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git checkout" that handles a lot of paths has been optimized by
reducing the number of unnecessary checks of paths in the
has_dir_name() function.
* jh/add-index-entry-optim:
read-cache: speed up has_dir_name (part 2)
read-cache: speed up has_dir_name (part 1)
read-cache: speed up add_index_entry during checkout
p0006-read-tree-checkout: perf test to time read-tree
read-cache: add strcmp_offset function
The string-list API used a custom reallocation strategy that was
very inefficient, instead of using the usual ALLOC_GROW() macro,
which has been fixed.
* jh/string-list-micro-optim:
string-list: use ALLOC_GROW macro when reallocing string_list
Change the test descriptions from being treated as binary blobs by
perl to being treated as UTF-8. This ensures that e.g. a test
description like "æ" is counted as 1 character, not 2.
I have WIP performance tests for non-ASCII grep patterns on another
topic that are affected by this.
Now instead of:
$ ./run p0000-perf-lib-sanity.sh
[...]
0000.4: export a weird var 0.00(0.00+0.00)
0000.5: éḿíẗ ńöń-ÁŚĆÍÍ ćḧáŕáćẗéŕś 0.00(0.00+0.00)
0000.7: important variables available in subshells 0.00(0.00+0.00)
[...]
We emit:
[...]
0000.4: export a weird var 0.00(0.00+0.00)
0000.5: éḿíẗ ńöń-ÁŚĆÍÍ ćḧáŕáćẗéŕś 0.00(0.00+0.00)
0000.7: important variables available in subshells 0.00(0.00+0.00)
[...]
Fixes code originally added in 342e9ef2d9 ("Introduce a performance
testing framework", 2012-02-17).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Hotfix for a topic that is already in 'master'.
* jh/memihash-opt:
p0004: make perf test executable
t3008: skip lazy-init test on a single-core box
test-online-cpus: helper to return cpu count
name-hash: fix buffer overrun
Created t/perf/repos/many-files.sh to generate large, but
artificial repositories.
Created t/perf/inflate-repo.sh to alter an EXISTING repo
to have a set of large commits. This can be used to create
a branch with 1M+ files in repositories like git.git or
linux.git, but with more realistic content. It does this
by making multiple copies of the entire worktree in a series
of sub-directories.
The branch name and ballast structure created by both scripts
match, so either script can be used to generate very large
test repositories for the following perf test.
Created t/perf/p0006-read-tree-checkout.sh to measure
performance on various read-tree, checkout, and update-index
operations. This test can run using either normal repos or
ones from the above scripts.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It looks like in 89c3b0ad43 (name-hash: add perf test for lazy_init_name_hash,
2017-03-23) p0004 was not created with the execute unix rights.
Let's fix that.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use ALLOC_GROW() macro when reallocing a string_list array
rather than simply increasing it by 32. This is a performance
optimization.
During status on a very large repo and there are many changes,
a significant percentage of the total run time is spent
reallocing the wt_status.changes array.
This change decreases the time in wt_status_collect_changes_worktree()
from 125 seconds to 45 seconds on my very large repository.
This produced a modest gain on my 1M file artificial repo, but
broke even on linux.git.
Test HEAD^^ HEAD
---------------------------------------------------------------------------------------
0005.2: read-tree status br_ballast (1000001) 8.29(5.62+2.62) 8.22(5.57+2.63) -0.8%
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The name-hash used for detecting paths that are different only in
cases (which matter on case insensitive filesystems) has been
optimized to take advantage of multi-threading when it makes sense.
* jh/memihash-opt:
name-hash: add test-lazy-init-name-hash to .gitignore
name-hash: add perf test for lazy_init_name_hash
name-hash: add test-lazy-init-name-hash
name-hash: perf improvement for lazy_init_name_hash
hashmap: document memihash_cont, hashmap_disallow_rehash api
hashmap: add disallow_rehash setting
hashmap: allow memihash computation to be continued
name-hash: specify initial size for istate.dir_hash table
Created t/perf/p0004-lazy-init-name-hash.sh test
to demonstrate correctness and performance gains
with the multithreaded version of lazy_init_name_hash().
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git filter-branch --prune-empty" drops a single-parent commit that
becomes a no-op, but did not drop a root commit whose tree is empty.
* dp/filter-branch-prune-empty:
p7000: add test for filter-branch with --prune-empty
filter-branch: fix --prune-empty on parentless commits
t7003: ensure --prune-empty removes entire branch when applicable
t7003: ensure --prune-empty can prune root commit
It's tempting to say:
./run v1.0.0 HEAD
to see how we've sped up Git over the years. Unfortunately,
this doesn't quite work because versions of Git prior to
v1.7.0 lack bin-wrappers, so our "run" script doesn't
correctly put them in the PATH.
Worse, it means we silently find whatever other "git" is in
the PATH, and produce test results that have no bearing on
what we asked for.
Let's fallback to the main git directory when bin-wrappers
isn't present. Many modern perf scripts won't run with such
an antique version of Git, of course, but at least those
failures are detected and reported (and you're free to write
a limited perf script that works across many versions).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 1a0962dee (t/perf: fix regression in testing older
versions of git, 2016-06-22), we point "$MODERN_GIT" to a
copy of git that matches the t/perf script itself, and which
can be used for tasks outside of the actual timings. This is
needed because the setup done by perf scripts keeps moving
forward in time, and may use features that the older
versions of git we are testing do not have.
That commit used $MODERN_GIT to fix a case where we relied
on the relatively recent --git-path option. But if you go
back further still, there are more problems.
Since 7501b5921 (perf: make the tests work in worktrees,
2016-05-13), we use "git -C", but versions of git older than
44e1e4d67 (git: run in a directory given with -C option,
2013-09-09) don't know about "-C". So testing an old version
of git with a new version of t/perf will fail the setup
step.
We can fix this by using $MODERN_GIT during the setup;
there's no need to use the antique version, since it doesn't
affect the timings. Likewise, we'll adjust the "init"
invocation; antique versions of git called this "init-db".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In p0001, a variable was created in a test_expect_success block to be
used in later test_perf blocks, but was not exported. This caused the
variable to not appear in those blocks (this can be verified by writing
'test -n "$commit"' in those blocks), resulting in a slightly different
invocation than what was intended. Export that variable.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adjust a perf test to new world order where commands that do
require a repository are really strict about having a repository.
* rs/p5302-create-repositories-before-tests:
p5302: create repositories for index-pack results explicitly
Before 7176a314 (index-pack: complain when --stdin is used outside of a
repo) index-pack silently created a non-existing target directory; now
the command refuses to work unless it's used against a valid repository.
That causes p5302 to fail, which relies on the former behavior. Fix it
by setting up the destinations for its performance tests using git init.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a sort command to test-string-list that reads lines from stdin,
stores them in a string_list and then sorts it. Use it in a simple
perf test script to measure the performance of string_list_sort().
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fetching from a remote that has many tags that are irrelevant
to branches we are following, we used to waste way too many cycles
when checking if the object pointed at by a tag (that we are not
going to fetch!) exists in our repository too carefully.
* jk/fetch-quick-tag-following:
fetch: use "quick" has_sha1_file for tag following
When we auto-follow tags in a fetch, we look at all of the
tags advertised by the remote and fetch ones where we don't
already have the tag, but we do have the object it peels to.
This involves a lot of calls to has_sha1_file(), some of
which we can reasonably expect to fail. Since 45e8a74
(has_sha1_file: re-check pack directory before giving up,
2013-08-30), this may cause many calls to
reprepare_packed_git(), which is potentially expensive.
This has gone unnoticed for several years because it
requires a fairly unique setup to matter:
1. You need to have a lot of packs on the client side to
make reprepare_packed_git() expensive (the most
expensive part is finding duplicates in an unsorted
list, which is currently quadratic).
2. You need a large number of tag refs on the server side
that are candidates for auto-following (i.e., that the
client doesn't have). Each one triggers a re-read of
the pack directory.
3. Under normal circumstances, the client would
auto-follow those tags and after one large fetch, (2)
would no longer be true. But if those tags point to
history which is disconnected from what the client
otherwise fetches, then it will never auto-follow, and
those candidates will impact it on every fetch.
So when all three are true, each fetch pays an extra
O(nr_tags * nr_packs^2) cost, mostly in string comparisons
on the pack names. This was exacerbated by 47bf4b0
(prepare_packed_git_one: refactor duplicate-pack check,
2014-06-30) which uses a slightly more expensive string
check, under the assumption that the duplicate check doesn't
happen very often (and it shouldn't; the real problem here
is how often we are calling reprepare_packed_git()).
This patch teaches fetch to use HAS_SHA1_QUICK to sacrifice
accuracy for speed, in cases where we might be racy with a
simultaneous repack. This is similar to the fix in 0eeb077
(index-pack: avoid excessive re-reading of pack directory,
2015-06-09). As with that case, it's OK for has_sha1_file()
occasionally say "no I don't have it" when we do, because
the worst case is not a corruption, but simply that we may
fail to auto-follow a tag that points to it.
Here are results from the included perf script, which sets
up a situation similar to the one described above:
Test HEAD^ HEAD
----------------------------------------------------------
5550.4: fetch 11.21(10.42+0.78) 0.08(0.04+0.02) -99.3%
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Performance tests done via "t/perf" did not use the same set of
build configuration if the user relied on autoconf generated
configuration.
* ks/perf-build-with-autoconf:
t/perf/run: copy config.mak.autogen & friends to build area
Some codepaths in "git pack-objects" were not ready to use an
existing pack bitmap; now they are and as the result they have
become faster.
* ks/pack-objects-bitmap:
pack-objects: use reachability bitmap index when generating non-stdout pack
pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use
Otherwise for people who use autotools-based configure in main worktree,
the performance testing results will be inconsistent as work and build
trees could be using e.g. different optimization levels.
See e.g.
http://public-inbox.org/git/20160818175222.bmm3ivjheokf2qzl@sigill.intra.peff.net/
for example.
NOTE config.status has to be copied because otherwise without it the build
would want to run reconfigure this way loosing just copied config.mak.autogen.
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects)
if a repository has bitmap index, pack-objects can nicely speedup
"Counting objects" graph traversal phase. That however was done only for
case when resultant pack is sent to stdout, not written into a file.
The reason here is for on-disk repack by default we want:
- to produce good pack (with bitmap index not-yet-packed objects are
emitted to pack in suboptimal order).
- to use more robust pack-generation codepath (avoiding possible
bugs in bitmap code and possible bitmap index corruption).
Jeff King further explains:
The reason for this split is that pack-objects tries to determine how
"careful" it should be based on whether we are packing to disk or to
stdout. Packing to disk implies "git repack", and that we will likely
delete the old packs after finishing. We want to be more careful (so
as not to carry forward a corruption, and to generate a more optimal
pack), and we presumably run less frequently and can afford extra CPU.
Whereas packing to stdout implies serving a remote via "git fetch" or
"git push". This happens more frequently (e.g., a server handling many
fetching clients), and we assume the receiving end takes more
responsibility for verifying the data.
But this isn't always the case. One might want to generate on-disk
packfiles for a specialized object transfer. Just using "--stdout" and
writing to a file is not optimal, as it will not generate the matching
pack index.
So it would be useful to have some way of overriding this heuristic:
to tell pack-objects that even though it should generate on-disk
files, it is still OK to use the reachability bitmaps to do the
traversal.
So we can teach pack-objects to use bitmap index for initial object
counting phase when generating resultant pack file too:
- if we take care to not let it be activated under git-repack:
See above about repack robustness and not forward-carrying corruption.
- if we know bitmap index generation is not enabled for resultant pack:
The current code has singleton bitmap_git, so it cannot work
simultaneously with two bitmap indices.
We also want to avoid (at least with current implementation)
generating bitmaps off of bitmaps. The reason here is: when generating
a pack, not-yet-packed objects will be emitted into pack in
suboptimal order and added to tail of the bitmap as "extended entries".
When the resultant pack + some new objects in associated repository
are in turn used to generate another pack with bitmap, the situation
repeats: new objects are again not emitted optimally and just added to
bitmap tail - not in recency order.
So the pack badness can grow over time when at each step we have
bitmapped pack + some other objects. That's why we want to avoid
generating bitmaps off of bitmaps, not to let pack badness grow.
- if we keep pack reuse enabled still only for "send-to-stdout" case:
Because pack-to-file needs to generate index for destination pack, and
currently on pack reuse raw entries are directly written out to the
destination pack by write_reused_pack(), bypassing needed for pack index
generation bookkeeping done by regular codepath in write_one() and
friends.
( In the future we might teach pack-reuse code about cases when index
also needs to be generated for resultant pack and remove
pack-reuse-only-for-stdout limitation )
This way for pack-objects -> file we get nice speedup:
erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup
repository managed by git-backup[2] via
time echo 0186ac99 | git pack-objects --revs erp5pack
before: 37.2s
after: 26.2s
And for `git repack -adb` packed git.git
time echo 5c589a73 | git pack-objects --revs gitpack
before: 7.1s
after: 3.6s
i.e. it can be 30% - 50% speedup for pack extraction.
git-backup extracts many packs on repositories restoration. That was my
initial motivation for the patch.
[1] https://lab.nexedi.com/nexedi/erp5
[2] https://lab.nexedi.com/kirr/git-backup
NOTE
Jeff also suggests that pack.useBitmaps was probably a mistake to
introduce originally. This way we are not adding another config point,
but instead just always default to-file pack-objects not to use bitmap
index: Tools which need to generate on-disk packs with using bitmap, can
pass --use-bitmap-index explicitly. And git-repack does never pass
--use-bitmap-index, so this way we can be sure regular on-disk repacking
remains robust.
NOTE2
`git pack-objects --stdout >file.pack` + `git index-pack file.pack` is much slower
than `git pack-objects file.pack`. Extracting erp5.git pack from
lab.nexedi.com backup repository:
$ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack
real 0m22.309s
user 0m21.148s
sys 0m0.932s
$ time git index-pack erp5pack-stdout.pack
real 0m50.873s <-- more than 2 times slower than time to generate pack itself!
user 0m49.300s
sys 0m1.360s
So the time for
`pack-object --stdout >file.pack` + `index-pack file.pack` is 72s,
while
`pack-objects file.pack` which does both pack and index is 27s.
And even
`pack-objects --no-use-bitmap-index file.pack` is 37s.
Jeff explains:
The packfile does not carry the sha1 of the objects. A receiving
index-pack has to compute them itself, including inflating and applying
all of the deltas.
that's why for `git-backup restore` we want to teach `git pack-objects
file.pack` to use bitmaps instead of using `git pack-objects --stdout
>file.pack` + `git index-pack file.pack`.
NOTE3
The speedup is now tracked via t/perf/p5310-pack-bitmaps.sh
Test 56dfeb62 this tree
--------------------------------------------------------------------------------
5310.2: repack to disk 8.98(8.05+0.29) 9.05(8.08+0.33) +0.8%
5310.3: simulated clone 2.02(2.27+0.09) 2.01(2.25+0.08) -0.5%
5310.4: simulated fetch 0.81(1.07+0.02) 0.81(1.05+0.04) +0.0%
5310.5: pack to file 7.58(7.04+0.28) 7.60(7.04+0.30) +0.3%
5310.6: pack to file (bitmap) 7.55(7.02+0.28) 3.25(2.82+0.18) -57.0%
5310.8: clone (partial bitmap) 1.83(2.26+0.12) 1.82(2.22+0.14) -0.5%
5310.9: pack to file (partial bitmap) 6.86(6.58+0.30) 2.87(2.74+0.20) -58.2%
More context:
http://marc.info/?t=146792101400001&r=1&w=2http://public-inbox.org/git/20160707190917.20011-1-kirr@nexedi.com/T/#t
Cc: Vicent Marti <tanoku@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The delta-base-cache mechanism has been a key to the performance in
a repository with a tightly packed packfile, but it did not scale
well even with a larger value of core.deltaBaseCacheLimit.
* jk/delta-base-cache:
t/perf: add basic perf tests for delta base cache
delta_base_cache: use hashmap.h
delta_base_cache: drop special treatment of blobs
delta_base_cache: use list.h for LRU
release_delta_base_cache: reuse existing detach function
clear_delta_base_cache_entry: use a more descriptive name
cache_or_unpack_entry: drop keep_cache parameter
This just shows off the improvements done by the last few
patches, and gives us a baseline for noticing regressions in
the future. Here are the results with linux.git as the perf
"large repo":
Test origin HEAD
-------------------------------------------------------------------
0003.1: log --raw 43.41(40.36+2.69) 33.86(30.96+2.41) -22.0%
0003.2: log -S 313.61(309.74+3.78) 298.75(295.58+3.00) -4.7%
(for a large repo, the "log -S" improvements are greater if
you bump the delta base cache limit, but I think it makes
sense to test the "stock" behavior, since that is what most
people will see).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git rebase" tries to compare set of changes on the updated
upstream and our own branch, it computes patch-id for all of these
changes and attempts to find matches. This has been optimized by
lazily computing the full patch-id (which is expensive) to be
compared only for changes that touch the same set of paths.
* kw/patch-ids-optim:
rebase: avoid computing unnecessary patch IDs
patch-ids: add flag to create the diff patch id using header only data
patch-ids: replace the seen indicator with a commit pointer
patch-ids: stop using a hand-rolled hashmap implementation
The `rebase` family of Git commands avoid applying patches that were
already integrated upstream. They do that by using the revision walking
option that computes the patch IDs of the two sides of the rebase
(local-only patches vs upstream-only ones) and skipping those local
patches whose patch ID matches one of the upstream ones.
In many cases, this causes unnecessary churn, as already the set of
paths touched by a given commit would suffice to determine that an
upstream patch has no local equivalent.
This hurts performance in particular when there are a lot of upstream
patches, and/or large ones.
Therefore, let's introduce the concept of a "diff-header-only" patch ID,
compare those first, and only evaluate the "full" patch ID lazily.
Please note that in contrast to the "full" patch IDs, those
"diff-header-only" patch IDs are prone to collide with one another, as
adjacent commits frequently touch the very same files. Hence we now
have to be careful to allow multiple hash entries with the same hash.
We accomplish that by using the hashmap_add() function that does not even
test for hash collisions. This also allows us to evaluate the full patch ID
lazily, i.e. only when we found commits with matching diff-header-only
patch IDs.
We add a performance test that demonstrates ~1-6% improvement. In
practice this will depend on various factors such as how many upstream
changes and how big those changes are along with whether file system
caches are cold or warm. As Git's test suite has no way of catching
performance regressions, we also add a regression test that verifies
that the full patch ID computation is skipped when the diff-header-only
computation suffices.
Signed-off-by: Kevin Willford <kcwillford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git's pack storage does efficient (log n) lookups in a
single packfile's index, but if we have multiple packfiles,
we have to linearly search each for a given object. This
patch introduces some timing tests for cases where we have a
large number of packs, so that we can measure any
improvements we make in the following patches.
The main thing we want to time is object lookup. To do this,
we measure "git rev-list --objects --all", which does a
fairly large number of object lookups (essentially one per
object in the repository).
However, we also measure the time to do a full repack, which
is interesting for two reasons. One is that in addition to
the usual pack lookup, it has its own linear iteration over
the list of packs. And two is that because it it is the tool
one uses to go from an inefficient many-pack situation back
to a single pack, we care about its performance not only at
marginal numbers of packs, but at the extreme cases (e.g.,
if you somehow end up with 5,000 packs, it is the only way
to get back to 1 pack, so we need to make sure it performs
well).
We measure the performance of each command in three
scenarios: 1 pack, 50 packs, and 1,000 packs.
The 1-pack case is a baseline; any optimizations we do to
handle multiple packs cannot possibly perform better than
this.
The 50-pack case is as far as Git should generally allow
your repository to go, if you have auto-gc enabled with the
default settings. So this represents the maximum performance
improvement we would expect under normal circumstances.
The 1,000-pack case is hopefully rare, though I have seen it
in the wild where automatic maintenance was broken for some
time (and the repository continued to receive pushes). This
represents cases where we care less about general
performance, but want to make sure that a full repack
command does not take excessively long.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow t/perf framework to use the features from the most recent
version of Git even when testing an older installed version.
* jk/perf-any-version:
p4211: explicitly disable renames in no-rename test
t/perf: fix regression in testing older versions of git
p4211 tests line-log performance both with and without "-M".
In v2.9.0, the case without "-M" appears to have regressed
badly, but that is only because we flipped on renames by
default.
Let's have the test explicitly disable renames to get
consistent timings (and to match the presumed intent of the
test, which is to see the effects with and without renames).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 7501b59 (perf: make the tests work in worktrees,
2016-05-13) introduced the use of "git rev-parse --git-path"
in the perf-lib setup code. Because the to-be-tested version
of git is at the front of the $PATH when this code runs,
this means we cannot use modern versions of t/perf to test
versions of git older than v2.5.0 (when that option was
introduced).
This is a symptom of a more general problem. The t/perf
suite is essentially independent of git versions, and
ideally we would be able to run the most modern and complete
set of tests across many historical versions (to see how
they compare). But any setup code they run is therefore
required to use the lowest common denominator we expect to
test.
So let's introduce a new variable, $MODERN_GIT, that we can
use both in perf-lib and in the test setup to get a reliable
set of git features (we might change git and break some
tests, of course, but $MODERN_GIT is tied to the same
version of git as the t/perf scripts, so they can be fixed
or adjusted together).
This commit fixes the "--git-path" case, but does not
mass-convert existing setup code to use $MODERN_GIT. Most
setup code is fairly vanilla and will work with effectively
all versions. But now the tool is there to fix any other
issues we find going forward.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As this developer has no access to MacOSX developer setups anymore,
Travis becomes the best bet to run performance tests on that OS.
However, on MacOSX /usr/bin/time is that good old BSD executable that
no Linux user cares about, as demonstrated by the perf-lib.sh's use
of GNU-ish extensions. And by the hard-coded path.
Let's just work around this issue by using gtime on MacOSX, the
Homebrew-provided GNU implementation onto which pretty much every
MacOSX power user falls back anyway.
To help other developers use Travis to run performance tests on
MacOSX, the .travis.yml file now sports a commented-out line that
installs GNU time via Homebrew.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In regular repositories $source_git and $objects_dir contain relative
paths based on $source. Go there to allow cp to resolve them.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This developer spent a lot of time trying to speed up the interactive
rebase, in particular on Windows. And will continue to do so.
To make it easier to demonstrate the performance improvement, let's have
a reproducible performance test.
The topic branch we use to test performance was found using these shell
commands (essentially searching for a long-enough topic branch in Git's
own history that touched the same file multiple times):
git rev-list --parents origin/master |
grep ' .* ' |
while read commit rest
do
patch_count=$(git rev-list --count $commit^..$commit^2)
test $patch_count -gt 20 || continue
merges="$(git rev-list --parents $commit^..$commit^2 |
grep ' .* ')"
test -z "$merges" || continue
patches_per_file="$(git log --pretty=%H --name-only \
$commit^..$commit^2 |
grep -v '^$' |
sort |
uniq -c -d |
sort -n -r)"
test -n "$patches_per_file" &&
test 20 -lt $(echo "$patches_per_file" |
sed -n '1s/^ *\([0-9]*\).*/\1/p') || continue
printf 'commit %s\n%s\n' "$commit" "$patches_per_file"
done
Note that we can get away with *not* having to reset to the original
branch tip before rebasing: we switch the first two "pick" lines every
time, so we end up with the same patch order after two rebases, and the
complexity of both rebases is equivalent.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch makes perf-lib.sh more robust so that it can run correctly
even inside a worktree. For example, it assumed that $GIT_DIR/objects is
the objects directory (which is not the case for worktrees) and it used
the commondir file verbatim, even if it contained a relative path.
Furthermore, the setup code expected `git rev-parse --git-dir` to spit
out a relative path, which is also not true for worktrees. Let's just
change the code to accept both relative and absolute paths, by avoiding
the `cd` into the copied working directory.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already have a perfectly fine prereq to tell us whether it is safe to
use symlinks. So let's use it.
This fixes the performance tests in Git for Windows' SDK, where symlinks
are not really available ([*1*]). This is not an issue with Git for
Windows itself because it configures core.symlinks=false in its system
config. However, the system config is disabled for the performance
tests, for obvious reasons: we want them to be independent of the
vagaries of any local configuration.
Footnote *1*: Windows has symbolic links. Git for Windows disables them
by default, though (for example: in standard setups, non-admins lack the
privilege to create symbolic links). For details, see
https://github.com/git-for-windows/git/wiki/Symbolic-Links
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we want to look up a submodule ref, we use
get_ref_cache(path) to find or auto-create its ref cache.
But if we feed a path that isn't actually a git repository,
we blindly create the ref cache, and then may die deeper in
the code when we try to access it. This is a problem because
many callers speculatively feed us a path that looks vaguely
like a repository, and expect us to tell them when it is
not.
This patch teaches resolve_gitlink_ref to reject
non-repository paths without creating a ref_cache. This
avoids the die(), and also performs better if you have a
large number of these faux-submodule directories (because
the ref_cache lookup is linear, under the assumption that
there won't be a large number of submodules).
To accomplish this, we also break get_ref_cache into two
pieces: the lookup and auto-creation (the latter is lumped
into create_ref_cache). This lets us first cheaply ask our
cache "is it a submodule we know about?" If so, we can avoid
repeating our filesystem lookup. So lookups of real
submodules are not penalized; they examine the submodule's
.git directory only once.
The test in t3000 demonstrates a case where this improves
correctness (we used to just die). The new perf case in
p7300 shows off the speed improvement in an admittedly
pathological repository:
Test HEAD^ HEAD
----------------------------------------------------------------
7300.4: ls-files -o 66.97(66.15+0.87) 0.33(0.08+0.24) -99.5%
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the user specifies an index filter but not a tree filter,
filter-branch cleverly avoids checking out the tree
entirely. But we don't do the next level of optimization: if
you have no index or tree filter, we do not need to read the
index at all.
This can greatly speed up cases where we are only changing
the commit objects (e.g., cementing a graft into place).
Here are numbers from the newly-added perf test:
Test HEAD^ HEAD
---------------------------------------------------------------
7000.2: noop filter 13.81(4.95+0.83) 5.43(0.42+0.43) -60.7%
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Performance-measurement tests did not work without an installed Git.
* sb/perf-without-installed-git:
t/perf: make runner work even if Git is not installed
aggregate.perl did not work when Git.pm is not installed to a directory
contained in the default Perl library path list or PERLLIB.
This commit prepends the Perl library path of the current Git source
tree to enable this.
Note that this commit adds a hard-coded relative path
use lib '../../perl/blib/lib';
instead of the flexible environment-based variant
use lib (split(/:/, $ENV{GITPERLLIB}));
which is used in tests written in Perl.
The hard-coded variant is used because the whole performance test
framework does it that way (and GITPERLLIB is not set there).
Signed-off-by: Stephan Beyer <s-beyer@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace "is this subdirectory a separate repository that should not
be touched?" check "git clean" does by checking if it has .git/HEAD
using the submodule-related code with a more optimized check.
* ee/clean-remove-dirs:
read_gitfile_gently: fix use-after-free
clean: improve performance when removing lots of directories
p7300: add performance tests for clean
t7300: add tests to document behavior of clean and nested git
setup: sanity check file size in read_gitfile_gently
setup: add gentle version of read_gitfile
The tests are run in dry-run mode to avoid having to restore the test
directories for each timed iteration. Using dry-run is an acceptable
compromise since we are mostly interested in the initial computation
of what to clean and not so much in the cleaning it self.
Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When copying the test repository, we try to detect whether
the copy succeeded. However, most of the heavy lifting is
done inside a for loop, where our "break" will lose the exit
code of the failing "cp". We can take advantage of the fact
that we are in a subshell, and just "exit 1" to break out
with a code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We currently have pack.writeBitmaps, which originally
operated at the pack-objects level. This should really have
been a repack.* option from day one. Let's give it the more
sensible name, but keep the old version as a deprecated
synonym.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Git CodingGuidelines prefer the $(...) construct for command
substitution instead of using the backquotes `...`.
The backquoted form is the traditional method for command
substitution, and is supported by POSIX. However, all but the
simplest uses become complicated quickly. In particular, embedded
command substitutions and/or the use of double quotes require
careful escaping with the backslash character.
The patch was generated by:
for _f in $(find . -name "*.sh")
do
sed -i 's@`\(.*\)`@$(\1)@g' ${_f}
done
and then carefully proof-read.
Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.
* jk/pack-bitmap: (26 commits)
ewah: unconditionally ntohll ewah data
ewah: support platforms that require aligned reads
read-cache: use get_be32 instead of hand-rolled ntoh_l
block-sha1: factor out get_be and put_be wrappers
do not discard revindex when re-preparing packfiles
pack-bitmap: implement optional name_hash cache
t/perf: add tests for pack bitmaps
t: add basic bitmap functionality tests
count-objects: recognize .bitmap in garbage-checking
repack: consider bitmaps when performing repacks
repack: handle optional files created by pack-objects
repack: turn exts array into array-of-struct
repack: stop using magic number for ARRAY_SIZE(exts)
pack-objects: implement bitmap writing
rev-list: add bitmap mode to speed up object lists
pack-objects: use bitmaps when packing objects
pack-objects: split add_object_entry
pack-bitmap: add support for bitmap indexes
documentation: add documentation for the bitmap format
ewah: compressed bitmap implementation
...
Fix performance regression in v1.8.4.x and later.
* jk/mark-edges-uninteresting:
list-objects: only look at cmdline trees with edge_hint
t/perf: time rev-list with UNINTERESTING commits
We time a straight "rev-list --all" and its "--object"
counterpart, both going all the way to the root. However, we
do not time a partial history walk. This patch adds an
extreme case: a walk over a very small slice of history, but
with a very large set of UNINTERESTING tips. This is similar
to the connectivity check run by git on a small fetch, or
the walk done by any pre-receive hooks that want to check
incoming commits.
This test reveals a performance regression in git v1.8.4.2,
caused by fbd4a70 (list-objects: mark more commits as edges
in mark_edges_uninteresting, 2013-08-16):
Test fbd4a703^ fbd4a703
------------------------------------------------------------------------------------------
0001.1: rev-list --all 0.69(0.67+0.02) 0.69(0.68+0.01) +0.0%
0001.2: rev-list --all --objects 3.47(3.44+0.02) 3.48(3.44+0.03) +0.3%
0001.4: rev-list $commit --not --all 0.04(0.04+0.00) 0.04(0.04+0.00) +0.0%
0001.5: rev-list --objects $commit --not --all 0.04(0.03+0.00) 0.27(0.24+0.02) +575.0%
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we use pack bitmaps rather than walking the object
graph, we end up with the list of objects to include in the
packfile, but we do not know the path at which any tree or
blob objects would be found.
In a recently packed repository, this is fine. A fetch would
use the paths only as a heuristic in the delta compression
phase, and a fully packed repository should not need to do
much delta compression.
As time passes, though, we may acquire more objects on top
of our large bitmapped pack. If clients fetch frequently,
then they never even look at the bitmapped history, and all
works as usual. However, a client who has not fetched since
the last bitmap repack will have "have" tips in the
bitmapped history, but "want" newer objects.
The bitmaps themselves degrade gracefully in this
circumstance. We manually walk the more recent bits of
history, and then use bitmaps when we hit them.
But we would also like to perform delta compression between
the newer objects and the bitmapped objects (both to delta
against what we know the user already has, but also between
"new" and "old" objects that the user is fetching). The lack
of pathnames makes our delta heuristics much less effective.
This patch adds an optional cache of the 32-bit name_hash
values to the end of the bitmap file. If present, a reader
can use it to match bitmapped and non-bitmapped names during
delta compression.
Here are perf results for p5310:
Test origin/master HEAD^ HEAD
-------------------------------------------------------------------------------------------------
5310.2: repack to disk 36.81(37.82+1.43) 47.70(48.74+1.41) +29.6% 47.75(48.70+1.51) +29.7%
5310.3: simulated clone 30.78(29.70+2.14) 1.08(0.97+0.10) -96.5% 1.07(0.94+0.12) -96.5%
5310.4: simulated fetch 3.16(6.10+0.08) 3.54(10.65+0.06) +12.0% 1.70(3.07+0.06) -46.2%
5310.6: partial bitmap 36.76(43.19+1.81) 6.71(11.25+0.76) -81.7% 4.08(6.26+0.46) -88.9%
You can see that the time spent on an incremental fetch goes
down, as our delta heuristics are able to do their work.
And we save time on the partial bitmap clone for the same
reason.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds a few basic perf tests for the pack bitmap code to
show off its improvements. The tests are:
1. How long does it take to do a repack (it gets slower
with bitmaps, since we have to do extra work)?
2. How long does it take to do a clone (it gets faster
with bitmaps)?
3. How does a small fetch perform when we've just
repacked?
4. How does a clone perform when we haven't repacked since
a week of pushes?
Here are results against linux.git:
Test origin/master this tree
-----------------------------------------------------------------------
5310.2: repack to disk 33.64(32.64+2.04) 67.67(66.75+1.84) +101.2%
5310.3: simulated clone 30.49(29.47+2.05) 1.20(1.10+0.10) -96.1%
5310.4: simulated fetch 3.49(6.79+0.06) 5.57(22.35+0.07) +59.6%
5310.6: partial bitmap 36.70(43.87+1.81) 8.18(21.92+0.73) -77.7%
You can see that we do take longer to repack, but we do way
better for further clones. A small fetch performs a bit
worse, as we spend way more time on delta compression (note
the heavy user CPU time, as we have 8 threads) due to the
lack of name hashes for the bitmapped objects.
The final test shows how the bitmaps degrade over time
between packs. There's still a significant speedup over the
non-bitmap case, but we don't do quite as well (we have to
spend time accessing the "new" objects the old fashioned
way, including delta compression).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff ../else/where/A ../else/where/B" when ../else/where is
clearly outside the repository, and "git diff --no-index A B", do
not have to look at the index at all, but we used to read the index
unconditionally.
* tg/diff-no-index-refactor:
diff: avoid some nesting
diff: add test for --no-index executed outside repo
diff: don't read index when --no-index is given
diff: move no-index detection to builtin/diff.c
git diff --no-index ... currently reads the index, during setup, when
calling gitmodules_config(). This results in worse performance when the
index is not actually needed. This patch avoids calling
gitmodules_config() when the --no-index option is given. The times for
executing "git diff --no-index" in the WebKit repository are improved as
follows:
Test HEAD~3 HEAD
------------------------------------------------------------------
4001.1: diff --no-index 0.24(0.15+0.09) 0.01(0.00+0.00) -95.8%
An additional improvement of this patch is that "git diff --no-index" no
longer breaks when the index file is corrupt, which makes it possible to
use it for investigating the broken repository.
To improve the possible usage as investigation tool for broken
repositories, setup_git_directory_gently() is also not called when the
--no-index option is given.
Also add a test to guard against future breakages, and a performance
test to show the improvements.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A #! line in these files is misleading, since these scriptlets are
meant to be sourced with '.' (using whatever shell sources them)
instead of run directly using the interpreter named on the #! line.
Removing the #! line shouldn't hurt syntax highlighting since
these files have filenames ending with '.sh'. For documentation,
add a brief description of how the files are meant to be used in
place of the shebang line.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`echo -n` is non-portable. The POSIX specification says:
Conforming applications that wish to do prompting without <newline>
characters or that could possibly be expecting to echo a -n, should
use the printf utility derived from the Ninth Edition system.
Since all of the affected shell scripts use a POSIX shell shebang,
replace `echo -n` invocations with printf.
Signed-off-by: Lukas Fleischer <git@cryptocrack.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allows N instances of tests run in parallel, each running 1/N parts
of the test suite under Valgrind, to speed things up.
* tr/test-v-and-v-subtest-only:
perf-lib: fix start/stop of perf tests
test-lib: support running tests under valgrind in parallel
test-lib: allow prefixing a custom string before "ok N" etc.
test-lib: valgrind for only tests matching a pattern
test-lib: verbose mode for only tests matching a pattern
test-lib: self-test that --verbose works
test-lib: rearrange start/end of test_expect_* and test_skip
test-lib: refactor $GIT_SKIP_TESTS matching
test-lib: enable MALLOC_* for the actual tests
ae75342 test-lib: rearrange start/end of test_expect_* and test_skip
changed the way tests are started/stopped, but did not update the perf
tests. They were therefore giving the wrong output, because of the
wrong test count. Fix this by starting and stopping the tests
correctly.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Acked-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 3.x tree has been out for a while now. The -2.6 repository name
survived the initial release [1], but kernel.org now only lists
'linux.git' (for aegl as well as torvalds) [2].
[1]: http://article.gmane.org/gmane.linux.kernel/1147422
On 2011-05-30 01:47:57 GMT, Linus Torvalds wrote:
> ... yes, that means that my git tree is still called
> "linux-2.6.git" on kernel.org.
[2]: http://git.kernel.org/cgit/
Signed-off-by: W. Trevor King <wking@tremily.us>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the helper test-read-cache, which can be used to call read_cache and
discard_cache in a loop as well as a performance check based on it.
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tr/line-log:
git-log(1): remove --full-line-diff description
line-log: fix documentation formatting
log -L: improve comments in process_all_files()
log -L: store the path instead of a diff_filespec
log -L: test merge of parallel modify/rename
t4211: pass -M to 'git log -M -L...' test
log -L: fix overlapping input ranges
log -L: check range set invariants when we look it up
Speed up log -L... -M
log -L: :pattern:file syntax to find by funcname
Implement line-history search (git log -L)
Export rewrite_parents() for 'log -L'
Refactor parse_loc
This is a rewrite of much of Bo's work, mainly in an effort to split
it into smaller, easier to understand routines.
The algorithm is built around the struct range_set, which encodes a
series of line ranges as intervals [a,b). This is used in two
contexts:
* A set of lines we are tracking (which will change as we dig through
history).
* To encode diffs, as pairs of ranges.
The main routine is range_set_map_across_diff(). It processes the
diff between a commit C and some parent P. It determines which diff
hunks are relevant to the ranges tracked in C, and computes the new
ranges for P.
The algorithm is then simply to process history in topological order
from newest to oldest, computing ranges and (partial) diffs. At
branch points, we need to merge the ranges we are watching. We will
find that many commits do not affect the chosen ranges, and mark them
TREESAME (in addition to those already filtered by pathspec limiting).
Another pass of history simplification then gets rid of such commits.
This is wired as an extra filtering pass in the log machinery. This
currently only reduces code duplication, but should allow for other
simplifications and options to be used.
Finally, we hook a diff printer into the output chain. Ideally we
would wire directly into the diff logic, to optionally use features
like word diff. However, that will require some major reworking of
the diff chain, so we completely replace the output with our own diff
for now.
As this was a GSoC project, and has quite some history by now, many
people have helped. In no particular order, thanks go to
Jakub Narebski <jnareb@gmail.com>
Jens Lehmann <Jens.Lehmann@web.de>
Jonathan Nieder <jrnieder@gmail.com>
Junio C Hamano <gitster@pobox.com>
Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Will Palmer <wmpalmer@gmail.com>
Apologies to everyone I forgot.
Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently the documentation of GIT_PERF_REPEAT_COUNT says the default is
five while "perf-lib.sh" uses a value of three as a default.
Update the documentation so that it is consistent with the code.
Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The malloc checks in tests are currently disabled. Actually evaluate
the variable for turning them off and enable them if it's unset.
Also use this opportunity to give it the more descriptive and
consistent name TEST_NO_MALLOC_CHECK.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>