git/builtin
Taylor Blau a8dd7e05b1 config: enable pack.writeReverseIndex by default
Back in e37d0b8730 (builtin/index-pack.c: write reverse indexes,
2021-01-25), Git learned how to read and write a pack's reverse index
from a file instead of in-memory.

A pack's reverse index is a mapping from pack position (that is, the
order that objects appear together in a ".pack")  to their position in
lexical order (that is, the order that objects are listed in an ".idx"
file).

Reverse indexes are consulted often during pack-objects, as well as
during auxiliary operations that require mapping between pack offsets,
pack order, and index index.

They are useful in GitHub's infrastructure, where we have seen a
dramatic increase in performance when writing ".rev" files[1]. In
particular:

  - an ~80% reduction in the time it takes to serve fetches on a popular
    repository, Homebrew/homebrew-core.

  - a ~60% reduction in the peak memory usage to serve fetches on that
    same repository.

  - a collective savings of ~35% in CPU time across all pack-objects
    invocations serving fetches across all repositories in a single
    datacenter.

Reverse indexes are also beneficial to end-users as well as forges. For
example, the time it takes to generate a pack containing the objects for
the 10 most recent commits in linux.git (representing a typical push) is
significantly faster when on-disk reverse indexes are available:

    $ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~10 } >in
    $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
    Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null
      Time (mean ± σ):     543.0 ms ±  20.3 ms    [User: 616.2 ms, System: 58.8 ms]
      Range (min … max):   521.0 ms … 577.9 ms    10 runs

    Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null
      Time (mean ± σ):     245.0 ms ±  11.4 ms    [User: 335.6 ms, System: 31.3 ms]
      Range (min … max):   226.0 ms … 259.6 ms    13 runs

    Summary
      'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran
	2.22 ± 0.13 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null'

The same is true of writing a pack containing the objects for the 30
most-recent commits:

    $ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~30 } >in
    $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
    Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null
      Time (mean ± σ):     866.5 ms ±  16.2 ms    [User: 1414.5 ms, System: 97.0 ms]
      Range (min … max):   839.3 ms … 886.9 ms    10 runs

    Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null
      Time (mean ± σ):     581.6 ms ±  10.2 ms    [User: 1181.7 ms, System: 62.6 ms]
      Range (min … max):   567.5 ms … 599.3 ms    10 runs

    Summary
      'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran
	1.49 ± 0.04 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null'

...and savings on trivial operations like computing the on-disk size of
a single (packed) object are even more dramatic:

    $ git rev-parse HEAD >in
    $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" <in'
    Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in
      Time (mean ± σ):     305.8 ms ±  11.4 ms    [User: 264.2 ms, System: 41.4 ms]
      Range (min … max):   290.3 ms … 331.1 ms    10 runs

    Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in
      Time (mean ± σ):       4.0 ms ±   0.3 ms    [User: 1.7 ms, System: 2.3 ms]
      Range (min … max):     1.6 ms …   4.6 ms    1155 runs

    Summary
      'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in' ran
       76.96 ± 6.25 times faster than 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in'

In the more than two years since e37d0b8730 was merged, Git's
implementation of on-disk reverse indexes has been thoroughly tested,
both from users enabling `pack.writeReverseIndexes`, and from GitHub's
deployment of the feature. The latter has been running without incident
for more than two years.

This patch changes Git's behavior to write on-disk reverse indexes by
default when indexing a pack, which should make the above operations
faster for everybody's Git installation after a repack.

(The previous commit explains some potential drawbacks of using on-disk
reverse indexes in certain limited circumstances, that essentially boil
down to a trade-off between time to generate, and time to access. For
those limited cases, the `pack.readReverseIndex` escape hatch can be
used).

[1]: https://github.blog/2021-04-29-scaling-monorepo-maintenance/#reverse-indexes

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-13 07:55:46 -07:00
..
add.c treewide: be explicit about dependence on gettext.h 2023-03-21 10:56:51 -07:00
am.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
annotate.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
apply.c treewide: be explicit about dependence on gettext.h 2023-03-21 10:56:51 -07:00
archive.c treewide: be explicit about dependence on gettext.h 2023-03-21 10:56:51 -07:00
bisect.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
blame.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
branch.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
bugreport.c setup.h: move declarations for setup.c functions from cache.h 2023-03-21 10:56:54 -07:00
bundle.c setup.h: move declarations for setup.c functions from cache.h 2023-03-21 10:56:54 -07:00
cat-file.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
check-attr.c write-or-die.h: move declarations for write-or-die.c functions from cache.h 2023-03-21 10:56:54 -07:00
check-ignore.c write-or-die.h: move declarations for write-or-die.c functions from cache.h 2023-03-21 10:56:54 -07:00
check-mailmap.c write-or-die.h: move declarations for write-or-die.c functions from cache.h 2023-03-21 10:56:54 -07:00
check-ref-format.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
checkout--worker.c treewide: be explicit about dependence on gettext.h 2023-03-21 10:56:51 -07:00
checkout-index.c setup.h: move declarations for setup.c functions from cache.h 2023-03-21 10:56:54 -07:00
checkout.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
clean.c setup.h: move declarations for setup.c functions from cache.h 2023-03-21 10:56:54 -07:00
clone.c Merge branch 'jc/clone-object-format-from-void' 2023-04-11 13:49:13 -07:00
column.c treewide: be explicit about dependence on gettext.h 2023-03-21 10:56:51 -07:00
commit-graph.c environment.h: move declarations for environment.c functions from cache.h 2023-03-21 10:56:53 -07:00
commit-tree.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
commit.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
config.c setup.h: move declarations for setup.c functions from cache.h 2023-03-21 10:56:54 -07:00
count-objects.c environment.h: move declarations for environment.c functions from cache.h 2023-03-21 10:56:53 -07:00
credential-cache--daemon.c abspath.h: move absolute path functions from cache.h 2023-03-21 10:56:52 -07:00
credential-cache.c write-or-die.h: move declarations for write-or-die.c functions from cache.h 2023-03-21 10:56:54 -07:00
credential-store.c write-or-die.h: move declarations for write-or-die.c functions from cache.h 2023-03-21 10:56:54 -07:00
credential.c builtins: mark unused prefix parameters 2023-03-28 14:11:24 -07:00
describe.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
diagnose.c abspath.h: move absolute path functions from cache.h 2023-03-21 10:56:52 -07:00
diff-files.c cocci: apply "pending" index-compatibility to some "builtin/*.c" 2022-11-21 12:06:15 +09:00
diff-index.c setup.h: move declarations for setup.c functions from cache.h 2023-03-21 10:56:54 -07:00
diff-tree.c treewide: be explicit about dependence on gettext.h 2023-03-21 10:56:51 -07:00
diff.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
difftool.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
fast-export.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
fast-import.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
fetch-pack.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
fetch.c Merge branch 'ds/fetch-bundle-uri-with-all' 2023-04-06 13:38:32 -07:00
fmt-merge-msg.c wrapper.h: move declarations for wrapper.c functions from cache.h 2023-03-21 10:56:53 -07:00
for-each-ref.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
for-each-repo.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
fsck.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
fsmonitor--daemon.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
gc.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
get-tar-commit-id.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
grep.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
hash-object.c write-or-die.h: move declarations for write-or-die.c functions from cache.h 2023-03-21 10:56:54 -07:00
help.c setup.h: move declarations for setup.c functions from cache.h 2023-03-21 10:56:54 -07:00
hook.c treewide: be explicit about dependence on gettext.h 2023-03-21 10:56:51 -07:00
index-pack.c config: enable pack.writeReverseIndex by default 2023-04-13 07:55:46 -07:00
init-db.c setup.h: move declarations for setup.c functions from cache.h 2023-03-21 10:56:54 -07:00
interpret-trailers.c treewide: be explicit about dependence on gettext.h 2023-03-21 10:56:51 -07:00
log.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
ls-files.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
ls-remote.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
ls-tree.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
mailinfo.c environment.h: move declarations for environment.c functions from cache.h 2023-03-21 10:56:53 -07:00
mailsplit.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
merge-base.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
merge-file.c setup.h: move declarations for setup.c functions from cache.h 2023-03-21 10:56:54 -07:00
merge-index.c builtins: mark unused prefix parameters 2023-03-28 14:11:24 -07:00
merge-ours.c builtins: mark unused prefix parameters 2023-03-28 14:11:24 -07:00
merge-recursive.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
merge-tree.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
merge.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
mktag.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
mktree.c treewide: be explicit about dependence on gettext.h 2023-03-21 10:56:51 -07:00
multi-pack-index.c environment.h: move declarations for environment.c functions from cache.h 2023-03-21 10:56:53 -07:00
mv.c setup.h: move declarations for setup.c functions from cache.h 2023-03-21 10:56:54 -07:00
name-rev.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
notes.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
pack-objects.c config: enable pack.writeReverseIndex by default 2023-04-13 07:55:46 -07:00
pack-redundant.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
pack-refs.c treewide: be explicit about dependence on gettext.h 2023-03-21 10:56:51 -07:00
patch-id.c treewide: be explicit about dependence on gettext.h 2023-03-21 10:56:51 -07:00
prune-packed.c treewide: be explicit about dependence on gettext.h 2023-03-21 10:56:51 -07:00
prune.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
pull.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
push.c environment.h: move declarations for environment.c functions from cache.h 2023-03-21 10:56:53 -07:00
range-diff.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
read-tree.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
rebase.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
receive-pack.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
reflog.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
remote-ext.c builtins: annotate always-empty prefix parameters 2023-03-28 14:11:24 -07:00
remote-fd.c builtins: annotate always-empty prefix parameters 2023-03-28 14:11:24 -07:00
remote.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
repack.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
replace.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
rerere.c wrapper.h: move declarations for wrapper.c functions from cache.h 2023-03-21 10:56:53 -07:00
reset.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
rev-list.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
rev-parse.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
revert.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
rm.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
send-pack.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
shortlog.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
show-branch.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
show-index.c treewide: be explicit about dependence on gettext.h 2023-03-21 10:56:51 -07:00
show-ref.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
sparse-checkout.c Merge branch 'ws/sparse-check-rules' 2023-04-11 13:49:12 -07:00
stash.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
stripspace.c write-or-die.h: move declarations for write-or-die.c functions from cache.h 2023-03-21 10:56:54 -07:00
submodule--helper.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
symbolic-ref.c treewide: be explicit about dependence on gettext.h 2023-03-21 10:56:51 -07:00
tag.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
unpack-file.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
unpack-objects.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
update-index.c write-or-die.h: move declarations for write-or-die.c functions from cache.h 2023-03-21 10:56:54 -07:00
update-ref.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
update-server-info.c treewide: be explicit about dependence on gettext.h 2023-03-21 10:56:51 -07:00
upload-archive.c builtins: annotate always-empty prefix parameters 2023-03-28 14:11:24 -07:00
upload-pack.c treewide: be explicit about dependence on gettext.h 2023-03-21 10:56:51 -07:00
var.c builtins: mark unused prefix parameters 2023-03-28 14:11:24 -07:00
verify-commit.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
verify-pack.c treewide: be explicit about dependence on gettext.h 2023-03-21 10:56:51 -07:00
verify-tag.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
worktree.c Merge branch 'en/header-split-cleanup' 2023-04-06 13:38:31 -07:00
write-tree.c environment.h: move declarations for environment.c functions from cache.h 2023-03-21 10:56:53 -07:00