git/reftable
Patrick Steinhardt 718a93ecc0 reftable/blocksource: use mmap to read tables
The blocksource interface provides an interface to read blocks from a
reftable table. This interface is implemented using read(3P) calls on
the underlying file descriptor. While this works alright, this pattern
is very inefficient when repeatedly querying the reftable stack for one
or more refs. This inefficiency can mostly be attributed to the fact
that we often need to re-read the same blocks over and over again, and
every single time we need to call read(3P) again.

A natural fit in this context is to use mmap(3P) instead of read(3P),
which has a bunch of benefits:

  - We do not need to come up with a caching strategy for some of the
    blocks as this will be handled by the kernel already.

  - We can avoid the overhead of having to call into the read(3P)
    syscall repeatedly.

  - We do not need to allocate returned blocks repeatedly, but can
    instead hand out pointers into the mmapped region directly.

Using mmap comes with a significant drawback on Windows though, because
mmapped files cannot be deleted and neither is it possible to rename
files onto an mmapped file. But for one, the reftable library gracefully
handles the case where auto-compaction cannot delete a still-open stack
already and ignores any such errors. Also, `reftable_stack_clean()` will
prune stale tables which are not referenced by "tables.list" anymore so
that those files can eventually be pruned. And second, we never rewrite
already-written stacks, so it does not matter that we cannot rename a
file over an mmaped file, either.

Another unfortunate property of mmap is that it is not supported by all
systems. But given that the size of reftables should typically be rather
limited (megabytes at most in the vast majority of repositories), we can
use the fallback implementation provided by `git_mmap()` which reads the
whole file into memory instead. This is the same strategy that the
"packed" backend uses.

While this change doesn't significantly improve performance in the case
where we're seeking through stacks once (like e.g. git-for-each-ref(1)
would). But it does speed up usecases where there is lots of random
access to refs, e.g. when writing. The following benchmark demonstrates
these savings with git-update-ref(1) creating N refs in an otherwise
empty repository:

  Benchmark 1: update-ref: create many refs (refcount = 1, revision = HEAD~)
    Time (mean ± σ):       5.1 ms ±   0.2 ms    [User: 2.5 ms, System: 2.5 ms]
    Range (min … max):     4.8 ms …   7.1 ms    111 runs

  Benchmark 2: update-ref: create many refs (refcount = 100, revision = HEAD~)
    Time (mean ± σ):      14.8 ms ±   0.5 ms    [User: 7.1 ms, System: 7.5 ms]
    Range (min … max):    14.1 ms …  18.7 ms    84 runs

  Benchmark 3: update-ref: create many refs (refcount = 10000, revision = HEAD~)
    Time (mean ± σ):     926.4 ms ±   5.6 ms    [User: 448.5 ms, System: 477.7 ms]
    Range (min … max):   920.0 ms … 936.1 ms    10 runs

  Benchmark 4: update-ref: create many refs (refcount = 1, revision = HEAD)
    Time (mean ± σ):       5.0 ms ±   0.2 ms    [User: 2.4 ms, System: 2.5 ms]
    Range (min … max):     4.7 ms …   5.4 ms    111 runs

  Benchmark 5: update-ref: create many refs (refcount = 100, revision = HEAD)
    Time (mean ± σ):      10.5 ms ±   0.2 ms    [User: 5.0 ms, System: 5.3 ms]
    Range (min … max):    10.0 ms …  10.9 ms    93 runs

  Benchmark 6: update-ref: create many refs (refcount = 10000, revision = HEAD)
    Time (mean ± σ):     529.6 ms ±   9.1 ms    [User: 268.0 ms, System: 261.4 ms]
    Range (min … max):   522.4 ms … 547.1 ms    10 runs

  Summary
    update-ref: create many refs (refcount = 1, revision = HEAD) ran
      1.01 ± 0.06 times faster than update-ref: create many refs (refcount = 1, revision = HEAD~)
      2.08 ± 0.07 times faster than update-ref: create many refs (refcount = 100, revision = HEAD)
      2.95 ± 0.14 times faster than update-ref: create many refs (refcount = 100, revision = HEAD~)
    105.33 ± 3.76 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD)
    184.24 ± 5.89 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD~)

Theoretically, we could also replicate the strategy of the "packed"
backend where small tables are read into memory instead of using mmap.
Benchmarks did not confirm that this has a performance benefit though.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-11 12:10:59 -08:00
..
basics.c reftable: utility functions 2021-10-08 10:45:48 -07:00
basics.h reftable: utility functions 2021-10-08 10:45:48 -07:00
basics_test.c reftable: utility functions 2021-10-08 10:45:48 -07:00
block.c reftable/block: reuse buffer to compute record keys 2023-12-11 07:23:17 -08:00
block.h reftable/block: reuse buffer to compute record keys 2023-12-11 07:23:17 -08:00
block_test.c reftable/block: introduce macro to initialize struct block_iter 2023-12-11 07:23:17 -08:00
blocksource.c reftable/blocksource: use mmap to read tables 2024-01-11 12:10:59 -08:00
blocksource.h reftable: add blocksource, an abstraction for random access reads 2021-10-08 10:45:48 -07:00
constants.h reftable: (de)serialization for the polymorphic record type. 2021-10-08 10:45:48 -07:00
dump.c hash-ll.h: split out of hash.h to remove dependency on repository.h 2023-04-24 12:47:32 -07:00
error.c reftable: ensure git-compat-util.h is the first (indirect) include 2023-04-24 12:47:33 -07:00
generic.c reftable: make assignments portable to AIX xlc v12.01 2022-03-28 13:58:10 -07:00
generic.h reftable: generic interface to tables 2021-10-08 10:45:48 -07:00
iter.c reftable: make reftable_record a tagged union 2022-01-20 11:31:53 -08:00
iter.h reftable/block: introduce macro to initialize struct block_iter 2023-12-11 07:23:17 -08:00
LICENSE
merged.c reftable/merged: reuse buffer to compute record keys 2023-12-11 07:23:16 -08:00
merged.h reftable/merged: reuse buffer to compute record keys 2023-12-11 07:23:16 -08:00
merged_test.c reftable tests: avoid "int" overflow, use "uint64_t" 2022-01-13 13:39:09 -08:00
pq.c reftable: use a pointer for pq_entry param 2022-09-15 11:32:37 -07:00
pq.h reftable: use a pointer for pq_entry param 2022-09-15 11:32:37 -07:00
pq_test.c reftable: use a pointer for pq_entry param 2022-09-15 11:32:37 -07:00
publicbasics.c reftable: ensure git-compat-util.h is the first (indirect) include 2023-04-24 12:47:33 -07:00
reader.c reftable/block: introduce macro to initialize struct block_iter 2023-12-11 07:23:17 -08:00
reader.h reftable: read reftable files 2021-10-08 10:45:48 -07:00
readwrite_test.c reftable/stack: fix use of unseeded randomness 2023-12-11 07:23:16 -08:00
record.c reftable: add print functions to the record types 2022-01-20 11:31:53 -08:00
record.h reftable: add print functions to the record types 2022-01-20 11:31:53 -08:00
record_test.c reftable: make assignments portable to AIX xlc v12.01 2022-03-28 13:58:10 -07:00
refname.c reftable: implement refname validation 2021-10-08 10:45:48 -07:00
refname.h reftable: implement refname validation 2021-10-08 10:45:48 -07:00
refname_test.c reftable: implement refname validation 2021-10-08 10:45:48 -07:00
reftable-blocksource.h reftable: add blocksource, an abstraction for random access reads 2021-10-08 10:45:48 -07:00
reftable-error.h reftable: signal overflow 2021-12-23 12:28:34 -08:00
reftable-generic.h reftable: generic interface to tables 2021-10-08 10:45:48 -07:00
reftable-iterator.h reftable: generic interface to tables 2021-10-08 10:45:48 -07:00
reftable-malloc.h reftable: utility functions 2021-10-08 10:45:48 -07:00
reftable-merged.h reftable: add merged table view 2021-10-08 10:45:48 -07:00
reftable-reader.h reftable: read reftable files 2021-10-08 10:45:48 -07:00
reftable-record.h reftable: make reftable-record.h function signatures const correct 2022-01-20 11:31:53 -08:00
reftable-stack.h reftable: implement stack, a mutable database of reftable files. 2021-10-08 10:45:48 -07:00
reftable-tests.h reftable: add a heap-based priority queue for reftable records 2021-10-08 10:45:48 -07:00
reftable-writer.h reftable: rename writer_stats to reftable_writer_stats 2022-02-23 13:36:26 -08:00
stack.c reftable/stack: use stat info to avoid re-reading stack list 2024-01-11 12:10:59 -08:00
stack.h reftable/stack: use stat info to avoid re-reading stack list 2024-01-11 12:10:59 -08:00
stack_test.c reftable/stack: perform auto-compaction with transactional interface 2023-12-11 07:23:16 -08:00
system.h reftable/stack: use stat info to avoid re-reading stack list 2024-01-11 12:10:59 -08:00
test_framework.c reftable: utility functions 2021-10-08 10:45:48 -07:00
test_framework.h reftable: wrap EXPECT macros in do/while 2023-12-11 07:23:15 -08:00
tree.c reftable: ensure git-compat-util.h is the first (indirect) include 2023-04-24 12:47:33 -07:00
tree.h reftable: a generic binary tree implementation 2021-10-08 10:45:48 -07:00
tree_test.c reftable: ensure git-compat-util.h is the first (indirect) include 2023-04-24 12:47:33 -07:00
writer.c Merge branch 'ep/maint-equals-null-cocci' 2022-05-20 15:26:59 -07:00
writer.h reftable: write reftable files 2021-10-08 10:45:48 -07:00