git/t/perf/p5600-partial-clone.sh

#!/bin/sh

test_description='performance of partial clones'
. ./perf-lib.sh

test_perf_default_repo

test_expect_success 'enable server-side config' '
	git config uploadpack.allowFilter true &&
	git config uploadpack.allowAnySHA1InWant true
'

test_perf 'clone without blobs' '
	rm -rf bare.git &&
	git clone --no-local --bare --filter=blob:none . bare.git
'

test_perf 'checkout of result' '
	rm -rf worktree &&
	mkdir -p worktree/.git &&
	tar -C bare.git -cf - . | tar -C worktree/.git -xf - &&
	git -C worktree config core.bare false &&
	git -C worktree checkout -f
'

test_perf 'fsck' '
	git -C bare.git fsck
'

test_perf 'count commits' '
	git -C bare.git rev-list --all --count
'

test_perf 'count non-promisor commits' '
	git -C bare.git rev-list --all --count --exclude-promisor-objects
'

test_perf 'gc' '
	git -C bare.git gc
'

test_done
t/perf: add perf script for partial clones We don't cover the partial clone feature at all in t/perf. Let's at least run a few basic tests so that we'll notice any regressions. We'll do a no-blob clone, and split it into two parts: the actual object transfer, and the subsequent checkout (which will of course require another transfer to get the blobs). That will help us more clearly assess the performance of each. There are obviously a lot more possibilities besides just a no-blob partial clone, but this should serve as a canary that alerts us to any generic slow-downs (and we can add more tests later for cases that aren't exercised here). There are a few non-ideal things here that make this not an entirely accurate test, but are probably OK for our purposes: 1. We have to do some extra prep/cleanup work inside the timing tests, since they impact the on-disk state and the perf harness may run each one multiple times. In practice this is probably OK, since these bits should be much less expensive than the operations we are measuring. 2. The clone time is likely to be dominated by the server's object enumeration. In the real world, a repo large enough to drive people to partial clones is likely to have reachability bitmaps enabled. And in the opposite direction, our object transfer is happening at the speed of a local pipe, whereas in the real world it would bottle-neck on the network. So any percentage speedups should be taken with a grain of salt. But hopefully any regressions will produce enough of an effect to be noticeable. This script also demonstrates the recent improvement from dfa33a298d (clone: do faster object check for partial clones, 2019-04-19): Test dfa33a298d^ dfa33a298d ------------------------------------------------------------------------- 5600.2: clone without blobs 18.41(22.72+1.09) 6.83(11.65+0.50) -62.9% 5600.3: checkout of result 1.82(3.24+0.26) 1.84(3.24+0.26) +1.1% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2019-05-02 21:45:09 +00:00			`#!/bin/sh`

			`test_description='performance of partial clones'`
			`. ./perf-lib.sh`

			`test_perf_default_repo`

			`test_expect_success 'enable server-side config' '`
			`git config uploadpack.allowFilter true &&`
			`git config uploadpack.allowAnySHA1InWant true`
			`'`

			`test_perf 'clone without blobs' '`
			`rm -rf bare.git &&`
			`git clone --no-local --bare --filter=blob:none . bare.git`
			`'`

			`test_perf 'checkout of result' '`
			`rm -rf worktree &&`
			`mkdir -p worktree/.git &&`
			`tar -C bare.git -cf - . \| tar -C worktree/.git -xf - &&`
			`git -C worktree config core.bare false &&`
			`git -C worktree checkout -f`
			`'`

is_promisor_object(): free tree buffer after parsing To get the list of all promisor objects, we not only include all objects in promisor packs, but also parse each of those objects to see which objects they reference. After parsing a tree object, the tree->buffer field will remain populated until we explicitly free it. So in a partial clone of blob:none, for example, we are essentially reading every tree in the repository (since they're all in the initial promisor pack), and keeping all of their uncompressed contents in memory at once. This patch frees the tree buffers after we've finished marking all of their reachable objects. We shouldn't need to do this for any other object type. While we are using some extra memory to store the structs, no other object type stores the whole contents in its parsed form (we do sometimes hold on to commit buffers, but less so these days due to commit graphs, plus most commands which care about promisor objects turn off the save_commit_buffer global). Even for a moderate-sized repository like git.git, this patch drops the peak heap (as measured by massif) for git-fsck from ~1.7GB to ~138MB. Fsck is a good candidate for measuring here because it doesn't interact with the promisor code except to call is_promisor_object(), so we can isolate just this problem. The added perf test shows only a tiny improvement on my machine for git.git, since 1.7GB isn't enough to cause any real memory pressure: Test HEAD^ HEAD -------------------------------------------------------------------------------- 5600.4: fsck 21.26(20.90+0.35) 20.84(20.79+0.04) -2.0% With linux.git the absolute change is a bit bigger, though still a small percentage: Test HEAD^ HEAD ----------------------------------------------------------------------------- 5600.4: fsck 262.26(259.13+3.12) 254.92(254.62+0.29) -2.8% I didn't have the patience to run it under massif with linux.git, but it's probably on the order of about 14GB improvement, since that's the sum of the sizes of all of the uncompressed trees (but still isn't enough to create memory pressure on this particular machine, which has 64GB of RAM). Smaller machines would probably see a bigger effect on runtime (and sadly our perf suite does not measure peak heap). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2021-04-13 07:15:54 +00:00			`test_perf 'fsck' '`
			`git -C bare.git fsck`
			`'`

revision: avoid parsing with --exclude-promisor-objects When --exclude-promisor-objects is given, before traversing any objects we iterate over all of the objects in any promisor packs, marking them as UNINTERESTING and SEEN. We turn the oid we get from iterating the pack into an object with parse_object(), but this has two problems: - it's slow; we are zlib inflating (and reconstructing from deltas) every byte of every object in the packfile - it leaves the tree buffers attached to their structs, which means our heap usage will grow to store every uncompressed tree simultaneously. This can be gigabytes. We can obviously fix the second by freeing the tree buffers after we've parsed them. But we can observe that the function doesn't look at the object contents at all! The only reason we call parse_object() is that we need a "struct object" on which to set the flags. There are two options here: - we can look up just the object type via oid_object_info(), and then call the appropriate lookup_foo() function - we can call lookup_unknown_object(), which gives us an OBJ_NONE struct (which will get auto-converted later by object_as_type() via calls to lookup_commit(), etc). The first one is closer to the current code, but we do pay the price to look up the type for each object. The latter should be more efficient in CPU, though it wastes a little bit of memory (the "unknown" object structs are a union of all object types, so some of the structs are bigger than they need to be). It also runs the risk of triggering a latent bug in code that calls lookup_object() directly but isn't ready to handle OBJ_NONE (such code would already be buggy, but we use lookup_unknown_object() infrequently enough that it might be hiding). I went with the second option here. I don't think the risk is high (and we'd want to find and fix any such bugs anyway), and it should be more efficient overall. The new tests in p5600 show off the improvement (this is on git.git): Test HEAD^ HEAD ------------------------------------------------------------------------------- 5600.5: count commits 0.37(0.37+0.00) 0.38(0.38+0.00) +2.7% 5600.6: count non-promisor commits 11.74(11.37+0.37) 0.04(0.03+0.00) -99.7% The improvement is particularly big in this script because _every_ object in the newly-cloned partial repo is a promisor object. So after marking them all, there's nothing left to traverse. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2021-04-13 07:17:48 +00:00			`test_perf 'count commits' '`
			`git -C bare.git rev-list --all --count`
			`'`

			`test_perf 'count non-promisor commits' '`
			`git -C bare.git rev-list --all --count --exclude-promisor-objects`
			`'`

repack: avoid loosening promisor objects in partial clones When `git repack -A -d` is run in a partial clone, `pack-objects` is invoked twice: once to repack all promisor objects, and once to repack all non-promisor objects. The latter `pack-objects` invocation is with --exclude-promisor-objects and --unpack-unreachable, which loosens all objects unused during this invocation. Unfortunately, this includes promisor objects. Because the -d argument to `git repack` subsequently deletes all loose objects also in packs, these just-loosened promisor objects will be immediately deleted. However, this extra disk churn is unnecessary in the first place. For example, in a newly-cloned partial repo that filters all blob objects (e.g. `--filter=blob:none`), `repack` ends up unpacking all trees and commits into the filesystem because every object, in this particular case, is a promisor object. Depending on the repo size, this increases the disk usage considerably: In my copy of the linux.git, the object directory peaked 26GB of more disk usage. In order to avoid this extra disk churn, pass the names of the promisor packfiles as --keep-pack arguments to the second invocation of `pack-objects`. This informs `pack-objects` that the promisor objects are already in a safe packfile and, therefore, do not need to be loosened. For testing, we need to validate whether any object was loosened. However, the "evidence" (loosened objects) is deleted during the process which prevents us from inspecting the object directory. Instead, let's teach `pack-objects` to count loosened objects and emit via trace2 thus allowing inspecting the debug events after the process is finished. This new event is used on the added regression test. Lastly, add a new perf test to evaluate the performance impact made by this changes (tested on git.git): Test HEAD^ HEAD ---------------------------------------------------------- 5600.3: gc 134.38(41.93+90.95) 7.80(6.72+1.35) -94.2% For a bigger repository, such as linux.git, the improvement is even bigger: Test HEAD^ HEAD ------------------------------------------------------------------- 5600.3: gc 6833.00(918.07+3162.74) 268.79(227.02+39.18) -96.1% These improvements are particular big because every object in the newly-cloned partial repository is a promisor object. Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Jeff King <peff@peff.net> Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2021-04-21 19:32:12 +00:00			`test_perf 'gc' '`
			`git -C bare.git gc`
			`'`

t/perf: add perf script for partial clones We don't cover the partial clone feature at all in t/perf. Let's at least run a few basic tests so that we'll notice any regressions. We'll do a no-blob clone, and split it into two parts: the actual object transfer, and the subsequent checkout (which will of course require another transfer to get the blobs). That will help us more clearly assess the performance of each. There are obviously a lot more possibilities besides just a no-blob partial clone, but this should serve as a canary that alerts us to any generic slow-downs (and we can add more tests later for cases that aren't exercised here). There are a few non-ideal things here that make this not an entirely accurate test, but are probably OK for our purposes: 1. We have to do some extra prep/cleanup work inside the timing tests, since they impact the on-disk state and the perf harness may run each one multiple times. In practice this is probably OK, since these bits should be much less expensive than the operations we are measuring. 2. The clone time is likely to be dominated by the server's object enumeration. In the real world, a repo large enough to drive people to partial clones is likely to have reachability bitmaps enabled. And in the opposite direction, our object transfer is happening at the speed of a local pipe, whereas in the real world it would bottle-neck on the network. So any percentage speedups should be taken with a grain of salt. But hopefully any regressions will produce enough of an effect to be noticeable. This script also demonstrates the recent improvement from dfa33a298d (clone: do faster object check for partial clones, 2019-04-19): Test dfa33a298d^ dfa33a298d ------------------------------------------------------------------------- 5600.2: clone without blobs 18.41(22.72+1.09) 6.83(11.65+0.50) -62.9% 5600.3: checkout of result 1.82(3.24+0.26) 1.84(3.24+0.26) +1.1% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2019-05-02 21:45:09 +00:00			`test_done`