git/reftable
Patrick Steinhardt a96e9a20f3 reftable/merged: allocation-less dropping of shadowed records
The purpose of the merged reftable iterator is to iterate through all
entries of a set of tables in the correct order. This is implemented by
using a sub-iterator for each table, where the next entry of each of
these iterators gets put into a priority queue. For each iteration, we
do roughly the following steps:

  1. Retrieve the top record of the priority queue. This is the entry we
     want to return to the caller.

  2. Retrieve the next record of the sub-iterator that this record came
     from. If any, add it to the priority queue at the correct position.
     The position is determined by comparing the record keys, which e.g.
     corresponds to the refname for ref records.

  3. Keep removing the top record of the priority queue until we hit the
     first entry whose key is larger than the returned record's key.
     This is required to drop "shadowed" records.

The last step will lead to at least one comparison to the next entry,
but may lead to many comparisons in case the reftable stack consists of
many tables with shadowed records. It is thus part of the hot code path
when iterating through records.

The code to compare the entries with each other is quite inefficient
though. Instead of comparing record keys with each other directly, we
first format them into `struct strbuf`s and only then compare them with
each other. While we already optimized this code path to reuse buffers
in 829231dc20 (reftable/merged: reuse buffer to compute record keys,
2023-12-11), the cost to format the keys into the buffers still adds up
quite significantly.

Refactor the code to use `reftable_record_cmp()` instead, which has been
introduced in the preceding commit. This function compares records with
each other directly without requiring any memory allocations or copying
and is thus way more efficient.

The following benchmark uses git-show-ref(1) to print a single ref
matching a pattern out of 1 million refs. This is the most direct way to
exercise ref iteration speed as we remove all overhead of having to show
the refs, too.

    Benchmark 1: show-ref: single matching ref (revision = HEAD~)
      Time (mean ± σ):     180.7 ms ±   4.7 ms    [User: 177.1 ms, System: 3.4 ms]
      Range (min … max):   174.9 ms … 211.7 ms    1000 runs

    Benchmark 2: show-ref: single matching ref (revision = HEAD)
      Time (mean ± σ):     162.1 ms ±   4.4 ms    [User: 158.5 ms, System: 3.4 ms]
      Range (min … max):   155.4 ms … 189.3 ms    1000 runs

    Summary
      show-ref: single matching ref (revision = HEAD) ran
        1.11 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12 09:18:04 -08:00
..
basics.c reftable: utility functions 2021-10-08 10:45:48 -07:00
basics.h reftable: utility functions 2021-10-08 10:45:48 -07:00
basics_test.c reftable: utility functions 2021-10-08 10:45:48 -07:00
block.c reftable/block: reuse buffer to compute record keys 2023-12-11 07:23:17 -08:00
block.h reftable/block: reuse buffer to compute record keys 2023-12-11 07:23:17 -08:00
block_test.c reftable/record: store "val1" hashes as static arrays 2024-01-03 09:54:20 -08:00
blocksource.c reftable/blocksource: use mmap to read tables 2024-01-11 12:10:59 -08:00
blocksource.h reftable: add blocksource, an abstraction for random access reads 2021-10-08 10:45:48 -07:00
constants.h reftable: (de)serialization for the polymorphic record type. 2021-10-08 10:45:48 -07:00
dump.c treewide: remove unnecessary includes in source files 2023-12-26 12:04:31 -08:00
error.c reftable: ensure git-compat-util.h is the first (indirect) include 2023-04-24 12:47:33 -07:00
generic.c treewide: remove unnecessary includes in source files 2023-12-26 12:04:31 -08:00
generic.h reftable: generic interface to tables 2021-10-08 10:45:48 -07:00
iter.c reftable: make reftable_record a tagged union 2022-01-20 11:31:53 -08:00
iter.h reftable/block: introduce macro to initialize struct block_iter 2023-12-11 07:23:17 -08:00
LICENSE
merged.c reftable/merged: allocation-less dropping of shadowed records 2024-02-12 09:18:04 -08:00
merged.h reftable/merged: allocation-less dropping of shadowed records 2024-02-12 09:18:04 -08:00
merged_test.c Merge branch 'ps/reftable-fixes-and-optims' 2024-01-16 10:11:57 -08:00
pq.c reftable: use a pointer for pq_entry param 2022-09-15 11:32:37 -07:00
pq.h reftable: use a pointer for pq_entry param 2022-09-15 11:32:37 -07:00
pq_test.c reftable: use a pointer for pq_entry param 2022-09-15 11:32:37 -07:00
publicbasics.c reftable: ensure git-compat-util.h is the first (indirect) include 2023-04-24 12:47:33 -07:00
reader.c Merge branch 'en/header-cleanup' 2024-01-08 14:05:15 -08:00
reader.h reftable: read reftable files 2021-10-08 10:45:48 -07:00
readwrite_test.c Merge branch 'ps/reftable-fixes-and-optims' 2024-01-16 10:11:57 -08:00
record.c reftable/record: introduce function to compare records by key 2024-02-12 09:18:04 -08:00
record.h reftable/record: introduce function to compare records by key 2024-02-12 09:18:04 -08:00
record_test.c reftable/record: store "val2" hashes as static arrays 2024-01-03 09:54:21 -08:00
refname.c reftable: implement refname validation 2021-10-08 10:45:48 -07:00
refname.h reftable: implement refname validation 2021-10-08 10:45:48 -07:00
refname_test.c treewide: remove unnecessary includes in source files 2023-12-26 12:04:31 -08:00
reftable-blocksource.h reftable: add blocksource, an abstraction for random access reads 2021-10-08 10:45:48 -07:00
reftable-error.h reftable: signal overflow 2021-12-23 12:28:34 -08:00
reftable-generic.h reftable: generic interface to tables 2021-10-08 10:45:48 -07:00
reftable-iterator.h reftable: generic interface to tables 2021-10-08 10:45:48 -07:00
reftable-malloc.h reftable: utility functions 2021-10-08 10:45:48 -07:00
reftable-merged.h reftable: add merged table view 2021-10-08 10:45:48 -07:00
reftable-reader.h reftable: read reftable files 2021-10-08 10:45:48 -07:00
reftable-record.h reftable/record: store "val2" hashes as static arrays 2024-01-03 09:54:21 -08:00
reftable-stack.h reftable: implement stack, a mutable database of reftable files. 2021-10-08 10:45:48 -07:00
reftable-tests.h reftable: add a heap-based priority queue for reftable records 2021-10-08 10:45:48 -07:00
reftable-writer.h reftable: rename writer_stats to reftable_writer_stats 2022-02-23 13:36:26 -08:00
stack.c Merge branch 'ps/reftable-optimize-io' 2024-01-29 16:02:59 -08:00
stack.h reftable/stack: fix race in up-to-date check 2024-01-18 12:02:09 -08:00
stack_test.c Merge branch 'ps/reftable-fixes-and-optims' 2024-01-16 10:11:57 -08:00
system.h reftable/stack: fix race in up-to-date check 2024-01-18 12:02:09 -08:00
test_framework.c treewide: remove unnecessary includes in source files 2023-12-26 12:04:31 -08:00
test_framework.h reftable: wrap EXPECT macros in do/while 2023-12-11 07:23:15 -08:00
tree.c reftable: ensure git-compat-util.h is the first (indirect) include 2023-04-24 12:47:33 -07:00
tree.h reftable: a generic binary tree implementation 2021-10-08 10:45:48 -07:00
tree_test.c treewide: remove unnecessary includes in source files 2023-12-26 12:04:31 -08:00
writer.c reftable/writer: fix index corruption when writing multiple indices 2024-01-03 09:54:20 -08:00
writer.h reftable: write reftable files 2021-10-08 10:45:48 -07:00