git/builtin
Patrick Steinhardt bcec6780b2 receive-pack: only use visible refs for connectivity check
When serving a push, git-receive-pack(1) needs to verify that the
packfile sent by the client contains all objects that are required by
the updated references. This connectivity check works by marking all
preexisting references as uninteresting and using the new reference tips
as starting point for a graph walk.

Marking all preexisting references as uninteresting can be a problem
when it comes to performance. Git forges tend to do internal bookkeeping
to keep alive sets of objects for internal use or make them easy to find
via certain references. These references are typically hidden away from
the user so that they are neither advertised nor writeable. At GitLab,
we have one particular repository that contains a total of 7 million
references, of which 6.8 million are indeed internal references. With
the current connectivity check we are forced to load all these
references in order to mark them as uninteresting, and this alone takes
around 15 seconds to compute.

We can optimize this by only taking into account the set of visible refs
when marking objects as uninteresting. This means that we may now walk
more objects until we hit any object that is marked as uninteresting.
But it is rather unlikely that clients send objects that make large
parts of objects reachable that have previously only ever been hidden,
whereas the common case is to push incremental changes that build on top
of the visible object graph.

This provides a huge boost to performance in the mentioned repository,
where the vast majority of its refs hidden. Pushing a new commit into
this repo with `transfer.hideRefs` set up to hide 6.8 million of 7 refs
as it is configured in Gitaly leads to a 4.5-fold speedup:

    Benchmark 1: main
      Time (mean ± σ):     30.977 s ±  0.157 s    [User: 30.226 s, System: 1.083 s]
      Range (min … max):   30.796 s … 31.071 s    3 runs

    Benchmark 2: pks-connectivity-check-hide-refs
      Time (mean ± σ):      6.799 s ±  0.063 s    [User: 6.803 s, System: 0.354 s]
      Range (min … max):    6.729 s …  6.850 s    3 runs

    Summary
      'pks-connectivity-check-hide-refs' ran
        4.56 ± 0.05 times faster than 'main'

As we mostly go through the same codepaths even in the case where there
are no hidden refs at all compared to the code before there is no change
in performance when no refs are hidden:

    Benchmark 1: main
      Time (mean ± σ):     48.188 s ±  0.432 s    [User: 49.326 s, System: 5.009 s]
      Range (min … max):   47.706 s … 48.539 s    3 runs

    Benchmark 2: pks-connectivity-check-hide-refs
      Time (mean ± σ):     48.027 s ±  0.500 s    [User: 48.934 s, System: 5.025 s]
      Range (min … max):   47.504 s … 48.500 s    3 runs

    Summary
      'pks-connectivity-check-hide-refs' ran
        1.00 ± 0.01 times faster than 'main'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-17 16:22:52 -05:00
..
add.c Merge branch 'ab/plug-leak-in-revisions' 2022-06-07 14:10:56 -07:00
am.c git-compat-util.h: use "UNUSED", not "UNUSED(var)" 2022-09-01 10:49:48 -07:00
annotate.c
apply.c apply.c: remove unnecessary include 2022-04-06 09:42:14 -07:00
archive.c parse-options: PARSE_OPT_KEEP_UNKNOWN only applies to --options 2022-08-19 11:13:14 -07:00
bisect--helper.c Merge branch 'rs/bisect-start-leakfix' 2022-10-17 14:56:32 -07:00
blame.c parse-options: add support for parsing subcommands 2022-08-19 11:13:14 -07:00
branch.c Merge branch 'rj/branch-edit-description-with-nth-checkout' 2022-10-21 11:37:29 -07:00
bugreport.c builtin/bugreport.c: create '--diagnose' option 2022-08-12 13:20:02 -07:00
bundle.c builtin/bundle.c: let parse-options parse subcommands 2022-08-19 11:13:15 -07:00
cat-file.c Merge branch 'tb/cat-file-z' 2022-08-05 15:52:14 -07:00
check-attr.c
check-ignore.c dir.[ch]: replace dir_init() with DIR_INIT 2021-07-01 12:32:22 -07:00
check-mailmap.c
check-ref-format.c check-ref-format: fix trivial memory leak 2022-07-01 11:43:42 -07:00
checkout--worker.c pkt-line.[ch]: remove unused packet_read_line_buf() 2021-10-15 13:09:40 -07:00
checkout-index.c checkout-index: integrate with sparse index 2022-01-13 13:49:45 -08:00
checkout.c git-compat-util.h: use "UNUSED", not "UNUSED(var)" 2022-09-01 10:49:48 -07:00
clean.c Merge branch 'vd/sparse-clean-etc' 2022-02-17 16:25:05 -08:00
clone.c Git 2.38.1 2022-10-17 15:46:09 -07:00
column.c column: fix parsing of the '--nl' option 2021-08-26 14:36:27 -07:00
commit-graph.c Merge branch 'ab/unused-annotation' 2022-09-14 12:56:39 -07:00
commit-tree.c use xopen() to handle fatal open(2) failures 2021-08-25 14:39:08 -07:00
commit.c commit: avoid writing to global in option callback 2022-10-06 09:58:06 -07:00
config.c git-compat-util.h: use "UNUSED", not "UNUSED(var)" 2022-09-01 10:49:48 -07:00
count-objects.c i18n: remove from i18n strings that do not hold translatable parts 2022-02-04 13:58:28 -08:00
credential-cache--daemon.c unix-socket: add backlog size option to unix_stream_listen() 2021-03-15 14:32:51 -07:00
credential-cache.c credential-cache: check for windows specific errors 2021-09-14 09:30:54 -07:00
credential-store.c Use a better name for the function interpolating paths 2021-07-26 12:17:16 -07:00
credential.c doc: fix git credential synopsis 2021-10-28 09:57:09 -07:00
describe.c git-compat-util.h: use "UNUSED", not "UNUSED(var)" 2022-09-01 10:49:48 -07:00
diagnose.c builtin/diagnose.c: don't translate the two mode values 2022-09-21 11:53:35 -07:00
diff-files.c diff-files: move misplaced cleanup label 2022-07-12 07:17:28 -07:00
diff-index.c revisions API: call diff_free(&revs->pruning) in revisions_release() 2022-04-13 23:56:10 -07:00
diff-tree.c 2.36 gitk/diff-tree --stdin regression fix 2022-04-26 09:26:35 -07:00
diff.c diff: support ^! for merges 2022-10-01 15:58:38 -07:00
difftool.c Merge branch 'ab/unused-annotation' 2022-09-14 12:56:39 -07:00
env--helper.c parse-options: PARSE_OPT_KEEP_UNKNOWN only applies to --options 2022-08-19 11:13:14 -07:00
fast-export.c Merge branch 'ab/unused-annotation' 2022-09-14 12:56:39 -07:00
fast-import.c git-compat-util.h: use "UNUSED", not "UNUSED(var)" 2022-09-01 10:49:48 -07:00
fetch-pack.c list-objects-filter: add and use initializers 2022-09-12 08:38:59 -07:00
fetch.c run-command API: move *_tr2() users to "run_processes_parallel()" 2022-10-12 14:12:41 -07:00
fmt-merge-msg.c merge: allow to pretend a merge is made into a different branch 2021-12-20 14:55:02 -08:00
for-each-ref.c for-each-ref: delay parsing of --sort=<atom> options 2021-10-20 14:33:07 -07:00
for-each-repo.c builtin/for-each-repo: remove unnecessary argv copy to plug leak 2021-07-26 12:19:20 -07:00
fsck.c Merge branch 'jk/fsck-on-diet' 2022-10-10 10:08:39 -07:00
fsmonitor--daemon.c Merge branch 'ed/fsmonitor-on-networked-macos' 2022-10-17 14:56:31 -07:00
gc.c Merge branch 'jk/unused-anno-more' 2022-10-27 14:51:52 -07:00
get-tar-commit-id.c
grep.c builtin/grep.c: integrate with sparse index 2022-09-23 09:41:27 -07:00
hash-object.c Merge branch 'ab/object-file-api-updates' 2022-03-16 17:53:08 -07:00
help.c git help: special-case scalar 2022-09-02 10:02:56 -07:00
hook.c builtin/hook.c: let parse-options parse subcommands 2022-08-19 11:13:15 -07:00
index-pack.c i18n: fix mismatched camelCase config variables 2022-06-17 10:38:26 -07:00
init-db.c i18n: refactor "foo and bar are mutually exclusive" 2022-01-05 13:29:23 -08:00
interpret-trailers.c
log.c Merge branch 'ab/unused-annotation' 2022-09-14 12:56:39 -07:00
ls-files.c ls-files: fix black space in error message 2022-09-12 09:25:40 -07:00
ls-remote.c Merge branch 'ep/maint-equals-null-cocci' 2022-05-20 15:26:59 -07:00
ls-tree.c git-compat-util.h: use "UNUSED", not "UNUSED(var)" 2022-09-01 10:49:48 -07:00
mailinfo.c mailinfo: allow squelching quoted CRLF warning 2021-05-10 15:06:22 +09:00
mailsplit.c Merge branch 'ep/maint-equals-null-cocci' 2022-05-20 15:26:59 -07:00
merge-base.c merge-base: free() allocated "struct commit **" list 2022-03-04 13:24:17 -08:00
merge-file.c merge-file: fix memory leaks on error path 2022-07-01 11:43:43 -07:00
merge-index.c merge-index: ensure full index 2021-04-14 13:47:21 -07:00
merge-ours.c builtins + test helpers: use return instead of exit() in cmd_* 2021-06-09 09:15:58 +09:00
merge-recursive.c gettext API users: don't explicitly cast ngettext()'s "n" 2022-03-07 11:57:52 -08:00
merge-tree.c merge-tree: add a --allow-unrelated-histories flag 2022-06-22 16:10:06 -07:00
merge.c Merge branch 'en/merge-unstash-only-on-clean-merge' into maint 2022-09-13 12:21:11 -07:00
mktag.c Merge branch 'ab/object-file-api-updates' 2022-03-16 17:53:08 -07:00
mktree.c mktree: do not check type of remote objects 2022-06-21 10:12:15 -07:00
multi-pack-index.c multi-pack-index: avoid writing to global in option callback 2022-10-06 09:56:51 -07:00
mv.c Merge branch 'sy/mv-out-of-cone' 2022-09-19 14:35:23 -07:00
name-rev.c git-compat-util.h: use "UNUSED", not "UNUSED(var)" 2022-09-01 10:49:48 -07:00
notes.c notes, remote: show unknown subcommands between `' 2022-09-07 12:06:12 -07:00
pack-objects.c Merge branch 'ab/unused-annotation' 2022-09-14 12:56:39 -07:00
pack-redundant.c tree-wide: apply equals-null.cocci 2022-05-02 09:50:37 -07:00
pack-refs.c
patch-id.c patch-id: fix scan_hunk_header on diffs with 1 line of before/after 2022-02-02 11:24:23 -08:00
prune-packed.c i18n: remove from i18n strings that do not hold translatable parts 2022-02-04 13:58:28 -08:00
prune.c revisions API users: add straightforward release_revisions() 2022-04-13 23:56:08 -07:00
pull.c pull: fix a "struct oid_array" memory leak 2022-07-01 11:43:43 -07:00
push.c push: improve grammar of branch.autoSetupMerge advice 2022-09-28 19:03:10 -07:00
range-diff.c range-diff: optionally accept pathspecs 2022-08-26 09:49:26 -07:00
read-tree.c read-tree: make three-way merge sparse-aware 2022-03-01 12:36:01 -08:00
rebase.c rebase: add rebase.updateRefs config option 2022-07-19 12:49:04 -07:00
receive-pack.c receive-pack: only use visible refs for connectivity check 2022-11-17 16:22:52 -05:00
reflog.c refs: unify parse_worktree_ref() and ref_type() 2022-09-19 11:11:11 -07:00
remote-ext.c
remote-fd.c
remote.c Merge branch 'jk/unused-anno-more' 2022-10-27 14:51:52 -07:00
repack.c repack: don't remove .keep packs with --pack-kept-objects 2022-10-17 21:29:23 -07:00
replace.c refs: use ref_namespaces for replace refs base 2022-08-05 14:13:12 -07:00
rerere.c xdiff users: use designated initializers for out_line 2021-05-11 12:47:31 +09:00
reset.c pathspec.h: move pathspec_needs_expanded_index() from reset.c to here 2022-08-08 13:23:26 -07:00
rev-list.c revision: add new parameter to exclude hidden refs 2022-11-17 16:22:52 -05:00
rev-parse.c rev-parse: add --exclude-hidden= option 2022-11-17 16:22:52 -05:00
revert.c parse-options: PARSE_OPT_KEEP_UNKNOWN only applies to --options 2022-08-19 11:13:14 -07:00
rm.c rm: integrate with sparse-index 2022-08-08 13:23:26 -07:00
send-pack.c i18n: factorize "invalid value" messages 2022-02-04 13:58:28 -08:00
shortlog.c parse-options: add support for parsing subcommands 2022-08-19 11:13:14 -07:00
show-branch.c git-compat-util.h: use "UNUSED", not "UNUSED(var)" 2022-09-01 10:49:48 -07:00
show-index.c builtin/show-index: set the algorithm for object IDs 2021-04-27 16:31:39 +09:00
show-ref.c git-compat-util.h: use "UNUSED", not "UNUSED(var)" 2022-09-01 10:49:48 -07:00
sparse-checkout.c pass subcommand "prefix" arguments to parse_options() 2022-08-25 09:43:29 -07:00
stash.c Merge branch 'ab/unused-annotation' 2022-09-14 12:56:39 -07:00
stripspace.c i18n: remove from i18n strings that do not hold translatable parts 2022-02-04 13:58:28 -08:00
submodule--helper.c Merge branch 'ab/run-hook-api-cleanup' 2022-10-27 14:51:53 -07:00
symbolic-ref.c Merge branch 'jc/symbolic-ref-no-recurse' 2022-10-21 11:37:28 -07:00
tag.c Merge branch 'ep/maint-equals-null-cocci' 2022-05-20 15:26:59 -07:00
unpack-file.c
unpack-objects.c unpack-objects: use stream_loose_object() to unpack large objects 2022-06-13 10:22:36 -07:00
update-index.c update-index: drop unused argc from do_reupdate() 2022-10-17 21:24:03 -07:00
update-ref.c update-ref: fix streaming of status updates 2021-09-03 11:35:15 -07:00
update-server-info.c i18n: remove from i18n strings that do not hold translatable parts 2022-02-04 13:58:28 -08:00
upload-archive.c upload-archive: use regular "struct child_process" pattern 2021-11-25 22:15:07 -08:00
upload-pack.c upload-pack: document and rename --advertise-refs 2021-08-05 08:59:37 -07:00
var.c var: add GIT_DEFAULT_BRANCH variable 2021-11-03 13:25:36 -07:00
verify-commit.c
verify-pack.c
verify-tag.c
worktree.c builtin/worktree.c: let parse-options parse subcommands 2022-08-19 11:13:16 -07:00
write-tree.c