Commit graph

161 commits

Author SHA1 Message Date
Junio C Hamano 82a31ec324 Merge branch 'jt/reftable-geometric-compaction'
The strategy to compact multiple tables of reftables after many
operations accumulate many entries has been improved to avoid
accumulating too many tables uncollected.

* jt/reftable-geometric-compaction:
  reftable/stack: use geometric table compaction
  reftable/stack: add env to disable autocompaction
  reftable/stack: expose option to disable auto-compaction
2024-04-16 14:50:30 -07:00
Junio C Hamano 7fbe3ead19 Merge branch 'ps/reftable-binsearch-updates'
Reftable code clean-up and some bugfixes.

* ps/reftable-binsearch-updates:
  reftable/block: avoid decoding keys when searching restart points
  reftable/record: extract function to decode key lengths
  reftable/block: fix error handling when searching restart points
  reftable/block: refactor binary search over restart points
  reftable/refname: refactor binary search over refnames
  reftable/basics: improve `binsearch()` test
  reftable/basics: fix return type of `binsearch()` to be `size_t`
2024-04-12 11:31:39 -07:00
Junio C Hamano eacfd581d2 Merge branch 'ps/pack-refs-auto'
"git pack-refs" learned the "--auto" option, which is a useful
addition to be triggered from "git gc --auto".

Acked-by: Karthik Nayak <karthik.188@gmail.com>
cf. <CAOLa=ZRAEA7rSUoYL0h-2qfEELdbPHbeGpgBJRqesyhHi9Q6WQ@mail.gmail.com>

* ps/pack-refs-auto:
  builtin/gc: pack refs when using `git maintenance run --auto`
  builtin/gc: forward git-gc(1)'s `--auto` flag when packing refs
  t6500: extract objects with "17" prefix
  builtin/gc: move `struct maintenance_run_opts`
  builtin/pack-refs: introduce new "--auto" flag
  builtin/pack-refs: release allocated memory
  refs/reftable: expose auto compaction via new flag
  refs: remove `PACK_REFS_ALL` flag
  refs: move `struct pack_refs_opts` to where it's used
  t/helper: drop pack-refs wrapper
  refs/reftable: print errors on compaction failure
  reftable/stack: gracefully handle failed auto-compaction due to locks
  reftable/stack: use error codes when locking fails during compaction
  reftable/error: discern locked/outdated errors
  reftable/stack: fix error handling in `reftable_stack_init_addition()`
2024-04-09 14:31:45 -07:00
Justin Tobler a949ebd342 reftable/stack: use geometric table compaction
To reduce the number of on-disk reftables, compaction is performed.
Contiguous tables with the same binary log value of size are grouped
into segments. The segment that has both the lowest binary log value and
contains more than one table is set as the starting point when
identifying the compaction segment.

Since segments containing a single table are not initially considered
for compaction, if the table appended to the list does not match the
previous table log value, no compaction occurs for the new table. It is
therefore possible for unbounded growth of the table list. This can be
demonstrated by repeating the following sequence:

git branch -f foo
git branch -d foo

Each operation results in a new table being written with no compaction
occurring until a separate operation produces a table matching the
previous table log value.

Instead, to avoid unbounded growth of the table list, the compaction
strategy is updated to ensure tables follow a geometric sequence after
each operation by individually evaluating each table in reverse index
order. This strategy results in a much simpler and more robust algorithm
compared to the previous one while also maintaining a minimal ordered
set of tables on-disk.

When creating 10 thousand references, the new strategy has no
performance impact:

Benchmark 1: update-ref: create refs sequentially (revision = HEAD~)
  Time (mean ± σ):     26.516 s ±  0.047 s    [User: 17.864 s, System: 8.491 s]
  Range (min … max):   26.447 s … 26.569 s    10 runs

Benchmark 2: update-ref: create refs sequentially (revision = HEAD)
  Time (mean ± σ):     26.417 s ±  0.028 s    [User: 17.738 s, System: 8.500 s]
  Range (min … max):   26.366 s … 26.444 s    10 runs

Summary
  update-ref: create refs sequentially (revision = HEAD) ran
    1.00 ± 0.00 times faster than update-ref: create refs sequentially (revision = HEAD~)

Some tests in `t0610-reftable-basics.sh` assert the on-disk state of
tables and are therefore updated to specify the correct new table count.
Since compaction is more aggressive in ensuring tables maintain a
geometric sequence, the expected table count is reduced in these tests.
In `reftable/stack_test.c` tests related to `sizes_to_segments()` are
removed because the function is no longer needed. Also, the
`test_suggest_compaction_segment()` test is updated to better showcase
and reflect the new geometric compaction behavior.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-08 12:11:10 -07:00
Justin Tobler bc91330cec reftable/stack: expose option to disable auto-compaction
The reftable stack already has a variable to configure whether or not to
run auto-compaction, but it is inaccessible to users of the library.
There exist use cases where a caller may want to have more control over
auto-compaction.

Move the `disable_auto_compact` option into `reftable_write_options` to
allow external callers to disable auto-compaction. This will be used in
a subsequent commit.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-08 12:11:10 -07:00
Junio C Hamano 7424fb7797 Merge branch 'ps/pack-refs-auto' into jt/reftable-geometric-compaction
* ps/pack-refs-auto:
  builtin/gc: pack refs when using `git maintenance run --auto`
  builtin/gc: forward git-gc(1)'s `--auto` flag when packing refs
  t6500: extract objects with "17" prefix
  builtin/gc: move `struct maintenance_run_opts`
  builtin/pack-refs: introduce new "--auto" flag
  builtin/pack-refs: release allocated memory
  refs/reftable: expose auto compaction via new flag
  refs: remove `PACK_REFS_ALL` flag
  refs: move `struct pack_refs_opts` to where it's used
  t/helper: drop pack-refs wrapper
  refs/reftable: print errors on compaction failure
  reftable/stack: gracefully handle failed auto-compaction due to locks
  reftable/stack: use error codes when locking fails during compaction
  reftable/error: discern locked/outdated errors
  reftable/stack: fix error handling in `reftable_stack_init_addition()`
2024-04-05 10:34:23 -07:00
Patrick Steinhardt d51d8cc368 reftable/block: avoid decoding keys when searching restart points
When searching over restart points in a block we decode the key of each
of the records, which results in a memory allocation. This is quite
pointless though given that records it restart points will never use
prefix compression and thus store their keys verbatim in the block.

Refactor the code so that we can avoid decoding the keys, which saves us
some allocations.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-03 09:16:50 -07:00
Patrick Steinhardt cd75790707 reftable/record: extract function to decode key lengths
We're about to refactor the binary search over restart points so that it
does not need to fully decode the record keys anymore. To do so we will
need to decode the record key lengths, which is non-trivial logic.

Extract the logic to decode these lengths from `refatble_decode_key()`
so that we can reuse it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-03 09:16:50 -07:00
Patrick Steinhardt f9e88544f5 reftable/block: fix error handling when searching restart points
When doing the binary search over restart points in a block we need to
decode the record keys. This decoding step can result in an error when
the block is corrupted, which we indicate to the caller of the binary
search by setting `args.error = 1`. But the only caller that exists
mishandles this because it in fact performs the error check before
calling `binsearch()`.

Fix this bug by checking for errors at the right point in time.
Furthermore, refactor `binsearch()` so that it aborts the search in case
the callback function returns a negative value so that we don't
needlessly continue to search the block.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-03 09:16:50 -07:00
Patrick Steinhardt 77307a61d6 reftable/block: refactor binary search over restart points
When seeking a record in our block reader we perform a binary search
over the block's restart points so that we don't have to do a linear
scan over the whole block. The logic to do so is quite intricate though,
which makes it hard to understand.

Improve documentation and rename some of the functions and variables so
that the code becomes easier to understand overall. This refactoring
should not result in any change in behaviour.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-03 09:16:50 -07:00
Patrick Steinhardt 21767925b0 reftable/refname: refactor binary search over refnames
It is comparatively hard to understand how exactly the binary search
over refnames works given that the function and variable names are not
exactly easy to grasp. Rename them to make this more obvious. This
should not result in any change in behaviour.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-03 09:16:50 -07:00
Patrick Steinhardt e8b808258e reftable/basics: improve binsearch() test
The `binsearch()` test is somewhat weird in that it doesn't explicitly
spell out its expectations. Instead it does so in a rather ad-hoc way
with some hard-to-understand computations.

Refactor the test to spell out the needle as well as expected index for
all testcases. This refactoring highlights that the `binsearch_func()`
is written somewhat weirdly to find the first integer smaller than the
needle, not smaller or equal to it. Adjust the function accordingly.

While at it, rename the callback function to better convey its meaning.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-03 09:16:49 -07:00
Patrick Steinhardt 3e7b36d129 reftable/basics: fix return type of binsearch() to be size_t
The `binsearch()` function can be used to find the first element for
which a callback functions returns a truish value. But while the array
size is of type `size_t`, the function in fact returns an `int` that is
supposed to index into that array.

Fix the function signature to return a `size_t`. This conversion does
not change any semantics given that the function would only ever return
a value in the range `[0, sz]` anyway.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-03 09:16:49 -07:00
Junio C Hamano 6938b355c0 Merge branch 'ps/reftable-unit-test-nfs-workaround'
A unit test for reftable code tried to enumerate all files in a
directory after reftable operations and expected to see nothing but
the files it wanted to leave there, but was fooled by .nfs* cruft
files left, which has been corrected.

* ps/reftable-unit-test-nfs-workaround:
  reftable: fix tests being broken by NFS' delete-after-close semantics
2024-04-01 13:21:35 -07:00
Patrick Steinhardt a2f711ade0 reftable/stack: gracefully handle failed auto-compaction due to locks
Whenever we commit a new table to the reftable stack we will end up
invoking auto-compaction of the stack to keep the total number of tables
at bay. This auto-compaction may fail though in case at least one of the
tables which we are about to compact is locked. This is indicated by the
compaction function returning `REFTABLE_LOCK_ERROR`. We do not handle
this case though, and thus bubble that return value up the calling
chain, which will ultimately cause a failure.

Fix this bug by ignoring `REFTABLE_LOCK_ERROR`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-25 09:54:07 -07:00
Patrick Steinhardt 33358350eb reftable/stack: use error codes when locking fails during compaction
Compaction of a reftable stack may fail gracefully when there is a
concurrent process that writes to the reftable stack and which has thus
locked either the "tables.list" file or one of the tables. This is
expected and can be handled gracefully by some of the callers which
invoke compaction. Thus, to indicate this situation to our callers, we
return a positive return code from `stack_compact_range()` and bubble it
up to the caller.

This kind of error handling is somewhat awkward though as many callers
in the call chain never even think of handling positive return values.
Thus, the result is either that such errors are swallowed by accident,
or that we abort operations with an unhelpful error message.

Make the code more robust by always using negative error codes when
compaction fails, with `REFTABLE_LOCK_ERROR` for the described benign
error case.

Note that only a single callsite knew to handle positive error codes
gracefully in the first place. Subsequent commits will touch up some of
the other sites to handle those errors better.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-25 09:51:11 -07:00
Patrick Steinhardt af18098c9d reftable/error: discern locked/outdated errors
We currently throw two different errors into a similar-but-different
error code:

  - Errors when trying to lock the reftable stack.

  - Errors when trying to write to the reftable stack which has been
    modified concurrently.

This results in unclear error handling and user-visible error messages.

Create a new `REFTABLE_OUTDATED_ERROR` so that those error conditions
can be clearly told apart from each other. Adjust users of the old
`REFTABLE_LOCK_ERROR` to use the new error code as required.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-25 09:51:11 -07:00
Patrick Steinhardt 630942a873 reftable/stack: fix error handling in reftable_stack_init_addition()
In `reftable_stack_init_addition()` we call `stack_uptodate()` after
having created the lockfile to check whether the stack was modified
concurrently, which is indicated by a positive return code from the
latter function. If so, we return a `REFTABLE_LOCK_ERROR` to the caller
and abort the addition.

The error handling has an off-by-one though because we check whether the
error code is `> 1` instead of `> 0`. Thus, instead of returning the
locking error, we would return a positive value. One of the callers of
`reftable_stack_init_addition()` works around this bug by repeating the
error code check without the off-by-one. But other callers are subtly
broken by this bug.

Fix this by checking for `err > 0` instead. This has the consequence
that `reftable_stack_init_addition()` won't ever return a positive error
code anymore, but will instead return `REFTABLE_LOCK_ERROR` now. Thus,
we can drop the check for a positive error code in `stack_try_add()`
now.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-25 09:51:11 -07:00
Junio C Hamano e8c1cda9a9 Merge branch 'ps/reftable-reflog-iteration-perf'
The code to iterate over reflogs in the reftable has been optimized
to reduce memory allocation and deallocation.

Reviewed-by: Josh Steadmon <steadmon@google.com>
cf. <Ze9eX-aaWoVaqsPP@google.com>

* ps/reftable-reflog-iteration-perf:
  refs/reftable: track last log record name via strbuf
  reftable/record: use scratch buffer when decoding records
  reftable/record: reuse message when decoding log records
  reftable/record: reuse refnames when decoding log records
  reftable/record: avoid copying author info
  reftable/record: convert old and new object IDs to arrays
  refs/reftable: reload correct stack when creating reflog iter
2024-03-21 14:55:13 -07:00
Junio C Hamano 3eba921f81 Merge branch 'ps/reftable-block-search-fix'
The reftable code has its own custom binary search function whose
comparison callback has an unusual interface, which caused the
binary search to degenerate into a linear search, which has been
corrected.

* ps/reftable-block-search-fix:
  reftable/block: fix binary search over restart counter
  reftable/record: fix memory leak when decoding object records
2024-03-21 14:55:12 -07:00
Junio C Hamano 330ed38a2d Merge branch 'ps/reftable-stack-tempfile'
The code in reftable backend that creates new table files works
better with the tempfile framework to avoid leaving cruft after a
failure.

* ps/reftable-stack-tempfile:
  reftable/stack: register compacted tables as tempfiles
  reftable/stack: register lockfiles during compaction
  reftable/stack: register new tables as tempfiles
  lockfile: report when rollback fails
2024-03-21 14:55:12 -07:00
Patrick Steinhardt 0068aa7946 reftable: fix tests being broken by NFS' delete-after-close semantics
It was reported that the reftable unit tests in t0032 fail with the
following assertion when running on top of NFS:

    running test_reftable_stack_compaction_concurrent_clean
    reftable/stack_test.c: 1063: failed assertion count_dir_entries(dir) == 2
    Aborted

Setting a breakpoint immediately before the assertion in fact shows the
following list of files:

    ./stack_test-1027.QJBpnd
    ./stack_test-1027.QJBpnd/0x000000000001-0x000000000003-dad7ac80.ref
    ./stack_test-1027.QJBpnd/.nfs000000000001729f00001e11
    ./stack_test-1027.QJBpnd/tables.list

Note the weird ".nfs*" file? This file is maintained by NFS clients in
order to emulate delete-after-last-close semantics that we rely on in
the reftable code [1]. Instead of unlinking the file right away and
keeping it open in the client, the NFS client will rename it to ".nfs*"
and then delete that temporary file when the last reference to it gets
dropped. Quoting the NFS FAQ:

    > D2. What is a "silly rename"? Why do these .nfsXXXXX files keep
    > showing up?
    >
    > A. Unix applications often open a scratch file and then unlink it.
    > They do this so that the file is not visible in the file system name
    > space to any other applications, and so that the system will
    > automatically clean up (delete) the file when the application exits.
    > This is known as "delete on last close", and is a tradition among
    > Unix applications.
    >
    > Because of the design of the NFS protocol, there is no way for a
    > file to be deleted from the name space but still remain in use by an
    > application. Thus NFS clients have to emulate this using what
    > already exists in the protocol. If an open file is unlinked, an NFS
    > client renames it to a special name that looks like ".nfsXXXXX".
    > This "hides" the file while it remains in use. This is known as a
    > "silly rename." Note that NFS servers have nothing to do with this
    > behavior.

This of course throws off the assertion that we got exactly two files in
that directory.

The test in question triggers this behaviour by holding two open file
descriptors to the "tables.list" file. One of the references is because
we are about to append to the stack, whereas the other reference is
because we want to compact it. As the compaction has just finished we
already rewrote "tables.list" to point to the new contents, but the
other file descriptor pointing to the old version is still open. Thus we
trigger the delete-after-last-close emulation.

Furthermore, it was reported that this behaviour only triggers with
4f36b8597c (reftable/stack: fix race in up-to-date check, 2024-01-18).
This is expected as well because it is the first point in time where we
actually keep the "tables.list" file descriptor open for the stat cache.

Fix this bug by skipping over any files that start with a leading dot
when counting files. While we could explicitly check for a prefix of
".nfs", other network file systems like SMB for example do the same
trickery but with a ".smb" prefix. In any case though, this loosening of
the assertion should be fine given that the reftable library would never
write files with leading dots by itself.

[1]: https://nfs.sourceforge.net/#faq_d2

Reported-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-21 10:32:21 -07:00
Junio C Hamano 667b545c62 Merge branch 'ps/reftable-stack-tempfile' into ps/pack-refs-auto
* ps/reftable-stack-tempfile:
  reftable/stack: register compacted tables as tempfiles
  reftable/stack: register lockfiles during compaction
  reftable/stack: register new tables as tempfiles
  lockfile: report when rollback fails
2024-03-18 13:24:32 -07:00
Patrick Steinhardt fffd981ec2 reftable/block: fix binary search over restart counter
Records store their keys prefix-compressed. As many records will share a
common prefix (e.g. "refs/heads/"), this can end up saving quite a bit
of disk space. The downside of this is that it is not possible to just
seek into the middle of a block and consume the corresponding record
because it may depend on prefixes read from preceding records.

To help with this usecase, the reftable format writes every n'th record
without using prefix compression, which is called a "restart". The list
of restarts is stored at the end of each block so that a reader can
figure out entry points at which to read a full record without having to
read all preceding records.

This allows us to do a binary search over the records in a block when
searching for a particular key by iterating through the restarts until
we have found the section in which our record must be located. From
thereon we perform a linear search to locate the desired record.

This mechanism is broken though. In `block_reader_seek()` we call
`binsearch()` over the count of restarts in the current block. The
function we pass to compare records with each other computes the key at
the current index and then compares it to our search key by calling
`strbuf_cmp()`, returning its result directly. But `binsearch()` expects
us to return a truish value that indicates whether the current index is
smaller than the searched-for key. And unless our key exactly matches
the value at the restart counter we always end up returning a truish
value.

The consequence is that `binsearch()` essentially always returns 0,
indicacting to us that we must start searching right at the beginning of
the block. This works by chance because we now always do a linear scan
from the start of the block, and thus we would still end up finding the
desired record. But needless to say, this makes the optimization quite
useless.

Fix this bug by returning whether the current key is smaller than the
searched key. As the current behaviour was correct it is not possible to
write a test. Furthermore it is also not really possible to demonstrate
in a benchmark that this fix speeds up seeking records.

This may cause the reader to question whether this binary search makes
sense in the first place if it doesn't even help with performance. But
it would end up helping if we were to read a reftable with a much larger
block size. Blocks can be up to 16MB in size, in which case it will
become much more important to avoid the linear scan. We are not yet
ready to read or write such larger blocks though, so we have to live
without a benchmark demonstrating this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-07 13:59:36 -08:00
Patrick Steinhardt 1a03591812 reftable/record: fix memory leak when decoding object records
When decoding records it is customary to reuse a `struct
reftable_ref_record` across calls. Thus, it may happen that the record
already holds some allocated memory. When decoding ref and log records
we handle this by releasing or reallocating held memory. But we fail to
do this for object records, which causes us to leak memory.

Fix this memory leak by releasing object records before we decode into
them. We may eventually want to reuse memory instead to avoid needless
reallocations. But for now, let's just plug the leak and be done.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-07 13:59:19 -08:00
Patrick Steinhardt 60c4c42515 reftable/stack: register compacted tables as tempfiles
We do not register tables resulting from stack compaction with the
tempfile API. Those tables will thus not be deleted in case Git gets
killed.

Refactor the code to register compacted tables as tempfiles.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-07 12:34:14 -08:00
Patrick Steinhardt 3a60f6a2c4 reftable/stack: register lockfiles during compaction
We do not register any of the locks we acquire when compacting the
reftable stack via our lockfiles interfaces. These locks will thus not
be released when Git gets killed.

Refactor the code to register locks as lockfiles.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-07 12:34:13 -08:00
Patrick Steinhardt 1920d17a99 reftable/stack: register new tables as tempfiles
We do not register new tables which we're about to add to the stack with
the tempfile API. Those tables will thus not be deleted in case Git gets
killed.

Refactor the code to register tables as tempfiles.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-07 12:34:13 -08:00
Patrick Steinhardt 7b8abc4d8c reftable/record: use scratch buffer when decoding records
When decoding log records we need a temporary buffer to decode the
reflog entry's name, mail address and message. As this buffer is local
to the function we thus have to reallocate it for every single log
record which we're about to decode, which is inefficient.

Refactor the code such that callers need to pass in a scratch buffer,
which allows us to reuse it for multiple decodes. This reduces the
number of allocations when iterating through reflogs. Before:

    HEAP SUMMARY:
        in use at exit: 13,473 bytes in 122 blocks
      total heap usage: 2,068,487 allocs, 2,068,365 frees, 305,122,946 bytes allocated

After:

    HEAP SUMMARY:
        in use at exit: 13,473 bytes in 122 blocks
      total heap usage: 1,068,485 allocs, 1,068,363 frees, 281,122,886 bytes allocated

Note that this commit also drop some redundant calls to `strbuf_reset()`
right before calling `decode_string()`. The latter already knows to
reset the buffer, so there is no need for these.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-05 09:10:06 -08:00
Patrick Steinhardt e0bd13beea reftable/record: reuse message when decoding log records
Same as the preceding commit we can allocate log messages as needed when
decoding log records, thus further reducing the number of allocations.
Before:

    HEAP SUMMARY:
        in use at exit: 13,473 bytes in 122 blocks
      total heap usage: 3,068,488 allocs, 3,068,366 frees, 307,122,961 bytes allocated

After:

    HEAP SUMMARY:
        in use at exit: 13,473 bytes in 122 blocks
      total heap usage: 2,068,487 allocs, 2,068,365 frees, 305,122,946 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-05 09:10:06 -08:00
Patrick Steinhardt 193fcb3ff8 reftable/record: reuse refnames when decoding log records
When decoding a log record we always reallocate their refname arrays.
This results in quite a lot of needless allocation churn.

Refactor the code to grow the array as required only. Like this, we
should usually only end up reallocating the array a small handful of
times when iterating over many refs. Before:

    HEAP SUMMARY:
        in use at exit: 13,473 bytes in 122 blocks
      total heap usage: 4,068,487 allocs, 4,068,365 frees, 332,011,793 bytes allocated

After:

    HEAP SUMMARY:
        in use at exit: 13,473 bytes in 122 blocks
      total heap usage: 3,068,488 allocs, 3,068,366 frees, 307,122,961 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-05 09:10:06 -08:00
Patrick Steinhardt 01639ec148 reftable/record: avoid copying author info
Each reflog entry contains information regarding the authorship of who
has made the change. This authorship information is not the same as that
of any of the commits that the reflog entry references, but instead
corresponds to the local user that has executed the command. Thus, it is
almost always the case that all reflog entries have the same author.

We can make use of this fact when decoding reftable records: instead of
freeing and then reallocating the authorship information of log records,
we can special-case when the next record during an iteration has the
exact same authorship as the preceding record. If so, then there is no
need to reallocate the respective fields.

This change results in two allocations less per log record that we're
iterating over in the most common case. Before:

    HEAP SUMMARY:
        in use at exit: 13,473 bytes in 122 blocks
      total heap usage: 6,068,489 allocs, 6,068,367 frees, 361,011,822 bytes allocated

After:

    HEAP SUMMARY:
        in use at exit: 13,473 bytes in 122 blocks
      total heap usage: 4,068,487 allocs, 4,068,365 frees, 332,011,793 bytes allocated

An alternative would be to store the capacity of both name and email and
then use `REFTABLE_ALLOC_GROW()` to conditionally reallocate the array.
But reftable records are copied around quite a lot, and thus we need to
be a bit mindful of the overall record size. Furthermore, a memory
comparison should also be more efficient than having to copy over memory
even if we wouldn't have to allocate a new array every time.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-05 09:10:06 -08:00
Patrick Steinhardt 87ff723018 reftable/record: convert old and new object IDs to arrays
In 7af607c58d (reftable/record: store "val1" hashes as static arrays,
2024-01-03) and b31e3cc620 (reftable/record: store "val2" hashes as
static arrays, 2024-01-03) we have converted ref records to store their
object IDs in a static array. Convert log records to do the same so that
their old and new object IDs are arrays, too.

This change results in two allocations less per log record that we're
iterating over. Before:

    HEAP SUMMARY:
        in use at exit: 13,473 bytes in 122 blocks
      total heap usage: 8,068,495 allocs, 8,068,373 frees, 401,011,862 bytes allocated

After:

    HEAP SUMMARY:
        in use at exit: 13,473 bytes in 122 blocks
      total heap usage: 6,068,489 allocs, 6,068,367 frees, 361,011,822 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-05 09:10:06 -08:00
Patrick Steinhardt f1bf54aee3 reftable: allow inlining of a few functions
We have a few functions which are basically just accessors to
structures. As those functions are executed inside the hot loop when
iterating through many refs, the fact that they cannot be inlined is
costing us some performance.

Move the function definitions into their respective headers so that they
can be inlined. This results in a performance improvement when iterating
over 1 million refs:

    Benchmark 1: show-ref: single matching ref (revision = HEAD~)
      Time (mean ± σ):     105.9 ms ±   3.6 ms    [User: 103.0 ms, System: 2.8 ms]
      Range (min … max):   103.1 ms … 133.4 ms    1000 runs

    Benchmark 2: show-ref: single matching ref (revision = HEAD)
      Time (mean ± σ):     100.7 ms ±   3.4 ms    [User: 97.8 ms, System: 2.8 ms]
      Range (min … max):    97.8 ms … 124.0 ms    1000 runs

    Summary
      show-ref: single matching ref (revision = HEAD) ran
        1.05 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-04 10:19:49 -08:00
Patrick Steinhardt daf4f43d0d reftable/record: decode keys in place
When reading a record from a block, we need to decode the record's key.
As reftable keys are prefix-compressed, meaning they reuse a prefix from
the preceding record's key, this is a bit more involved than just having
to copy the relevant bytes: we need to figure out the prefix and suffix
lengths, copy the prefix from the preceding record and finally copy the
suffix from the current record.

This is done by passing three buffers to `reftable_decode_key()`: one
buffer that holds the result, one buffer that holds the last key, and
one buffer that points to the current record. The final key is then
assembled by calling `strbuf_add()` twice to copy over the prefix and
suffix.

Performing two memory copies is inefficient though. And we can indeed do
better by decoding keys in place. Instead of providing two buffers, the
caller may only call a single buffer that is already pre-populated with
the last key. Like this, we only have to call `strbuf_setlen()` to trim
the record to its prefix and then `strbuf_add()` to add the suffix.

This refactoring leads to a noticeable performance bump when iterating
over 1 million refs:

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     112.2 ms ±   3.9 ms    [User: 109.3 ms, System: 2.8 ms]
    Range (min … max):   109.2 ms … 149.6 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     106.0 ms ±   3.5 ms    [User: 103.2 ms, System: 2.7 ms]
    Range (min … max):   103.2 ms … 133.7 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.06 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-04 10:19:49 -08:00
Patrick Steinhardt 6620f9134c reftable/record: reuse refname when copying
Do the same optimization as in the preceding commit, but this time for
`reftable_record_copy()`. While not as noticeable, it still results in a
small speedup when iterating over 1 million refs:

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     114.0 ms ±   3.8 ms    [User: 111.1 ms, System: 2.7 ms]
    Range (min … max):   110.9 ms … 144.3 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     112.5 ms ±   3.7 ms    [User: 109.5 ms, System: 2.8 ms]
    Range (min … max):   109.2 ms … 140.7 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.01 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-04 10:19:49 -08:00
Patrick Steinhardt 71d9a2e991 reftable/record: reuse refname when decoding
When decoding a reftable record we will first release the user-provided
record and then decode the new record into it. This is quite inefficient
as we basically need to reallocate at least the refname every time.

Refactor the function to start tracking the refname capacity. Like this,
we can stow away the refname, release, restore and then grow the refname
to the required number of bytes via `REFTABLE_ALLOC_GROW()`.

This refactoring is safe to do because all functions that assigning to
the refname will first call `reftable_ref_record_release()`, which will
zero out the complete record after releasing memory.

This change results in a nice speedup when iterating over 1 million
refs:

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)

    Time (mean ± σ):     124.0 ms ±   3.9 ms    [User: 121.1 ms, System: 2.7 ms]
    Range (min … max):   120.4 ms … 152.7 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     114.4 ms ±   3.7 ms    [User: 111.5 ms, System: 2.7 ms]
    Range (min … max):   111.0 ms … 152.1 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.08 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)

Furthermore, with this change we now perform a mostly constant number of
allocations when iterating. Before this change:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 1,006,620 allocs, 1,006,495 frees, 25,398,363 bytes allocated

After this change:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 6,623 allocs, 6,498 frees, 509,592 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-04 10:19:40 -08:00
Patrick Steinhardt 080f8c4565 reftable/merged: avoid duplicate pqueue emptiness check
When calling `merged_iter_next_void()` we first check whether the iter
has been exhausted already. We already perform this check two levels
down the stack in `merged_iter_next_entry()` though, which makes this
check redundant.

Now if this check was there to accelerate the common case it might have
made sense to keep it. But the iterator being exhausted is rather the
uncommon case because you can expect most reftable stacks to contain
more than two refs.

Simplify the code by removing the check. As `merged_iter_next_void()` is
basically empty except for calling `merged_iter_next()` now, merge these
two functions. This also results in a tiny speedup when iterating over
many refs:

    Benchmark 1: show-ref: single matching ref (revision = HEAD~)
      Time (mean ± σ):     125.6 ms ±   3.8 ms    [User: 122.7 ms, System: 2.8 ms]
      Range (min … max):   122.4 ms … 153.4 ms    1000 runs

    Benchmark 2: show-ref: single matching ref (revision = HEAD)
      Time (mean ± σ):     124.0 ms ±   3.9 ms    [User: 121.1 ms, System: 2.8 ms]
      Range (min … max):   120.1 ms … 156.4 ms    1000 runs

    Summary
      show-ref: single matching ref (revision = HEAD) ran
        1.01 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-04 10:19:40 -08:00
Patrick Steinhardt f8c1a8e2e1 reftable/merged: circumvent pqueue with single subiter
The merged iterator uses a priority queue to order records so that we
can yielid them in the expected order. This priority queue of course
comes with some overhead as we need to add, compare and remove entries
in that priority queue.

In the general case, that overhead cannot really be avoided. But when we
have a single subiter left then there is no need to use the priority
queue anymore because the order is exactly the same as what that subiter
would return.

While having a single subiter may sound like an edge case, it happens
more frequently than one might think. In the most common scenario, you
can expect a repository to have a single large table that contains most
of the records and then a set of smaller tables which contain later
additions to the reftable stack. In this case it is quite likely that we
exhaust subiters of those smaller stacks before exhausting the large
table.

Special-case this and return records directly from the remaining
subiter. This results in a sizeable speedup when iterating over 1m refs
in a repository with a single table:

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     135.4 ms ±   4.4 ms    [User: 132.5 ms, System: 2.8 ms]
    Range (min … max):   131.0 ms … 166.3 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     126.3 ms ±   3.9 ms    [User: 123.3 ms, System: 2.8 ms]
    Range (min … max):   122.7 ms … 157.0 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.07 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-04 10:19:40 -08:00
Patrick Steinhardt 3b6dd6ad1d reftable/merged: handle subiter cleanup on close only
When advancing one of the subiters fails we immediately release
resources associated with that subiter. This is not necessary though as
we will release these resources when closing the merged iterator anyway.

Drop the logic and only release resources when the merged iterator is
done. This is a mere cleanup that should help reduce the cognitive load
when reading through the code.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-04 10:19:39 -08:00
Patrick Steinhardt 2d71a1d4a2 reftable/merged: remove unnecessary null check for subiters
Whenever we advance a subiter we first call `iterator_is_null()`. This
is not needed though because we only ever advance subiters which have
entries in the priority queue, and we do not end entries to the priority
queue when the subiter has been exhausted.

Drop the check as well as the now-unused function. This results in a
surprisingly big speedup:

    Benchmark 1: show-ref: single matching ref (revision = HEAD~)
      Time (mean ± σ):     138.1 ms ±   4.4 ms    [User: 135.1 ms, System: 2.8 ms]
      Range (min … max):   133.4 ms … 167.3 ms    1000 runs

    Benchmark 2: show-ref: single matching ref (revision = HEAD)
      Time (mean ± σ):     134.4 ms ±   4.2 ms    [User: 131.5 ms, System: 2.8 ms]
      Range (min … max):   130.0 ms … 164.0 ms    1000 runs

    Summary
      show-ref: single matching ref (revision = HEAD) ran
        1.03 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-04 10:19:39 -08:00
Patrick Steinhardt bb2d6be4c1 reftable/merged: make subiters own their records
For each subiterator, the merged table needs to track their current
record. This record is owned by the priority queue though instead of by
the merged iterator. This is not optimal performance-wise.

For one, we need to move around records whenever we add or remove a
record from the priority queue. Thus, the bigger the entries the more
bytes we need to copy around. And compared to pointers, a reftable
record is rather on the bigger side. The other issue is that this makes
it harder to reuse the records.

Refactor the code so that the merged iterator tracks ownership of the
records per-subiter. Instead of having records in the priority queue, we
can now use mere pointers to the per-subiter records. This also allows
us to swap records between the caller and the per-subiter record instead
of doing an actual copy via `reftable_record_copy_from()`, which removes
the need to release the caller-provided record.

This results in a noticeable speedup when iterating through many refs.
The following benchmark iterates through 1 million refs:

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     145.5 ms ±   4.5 ms    [User: 142.5 ms, System: 2.8 ms]
    Range (min … max):   141.3 ms … 177.0 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     139.0 ms ±   4.7 ms    [User: 136.1 ms, System: 2.8 ms]
    Range (min … max):   134.2 ms … 182.2 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.05 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)

This refactoring also allows a subsequent refactoring where we start
reusing memory allocated by the reftable records because we do not need
to release the caller-provided record anymore.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-04 10:19:39 -08:00
Patrick Steinhardt aad8ad6fe1 reftable/merged: advance subiter on subsequent iteration
When advancing the merged iterator, we pop the topmost entry from its
priority queue and then advance the sub-iterator that the entry belongs
to, adding the result as a new entry. This is quite sensible in the case
where the merged iterator is used to actually iterate through records.
But the merged iterator is also used when we look up a single record,
only, so advancing the sub-iterator is wasted effort because we would
never even look at the result.

Instead of immediately advancing the sub-iterator, we can also defer
this to the next iteration of the merged iterator by storing the
intent-to-advance. This results in a small speedup when reading many
records. The following benchmark creates 10000 refs, which will also end
up with many ref lookups:

    Benchmark 1: update-ref: create many refs (revision = HEAD~)
      Time (mean ± σ):     337.2 ms ±   7.3 ms    [User: 200.1 ms, System: 136.9 ms]
      Range (min … max):   329.3 ms … 373.2 ms    100 runs

    Benchmark 2: update-ref: create many refs (revision = HEAD)
      Time (mean ± σ):     332.5 ms ±   5.9 ms    [User: 197.2 ms, System: 135.1 ms]
      Range (min … max):   327.6 ms … 359.8 ms    100 runs

    Summary
      update-ref: create many refs (revision = HEAD) ran
        1.01 ± 0.03 times faster than update-ref: create many refs (revision = HEAD~)

While this speedup alone isn't really worth it, this refactoring will
also allow two additional optimizations in subsequent patches. First, it
will allow us to special-case when there is only a single sub-iter left
to circumvent the priority queue altogether. And second, it makes it
easier to avoid copying records to the caller.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-04 10:19:30 -08:00
Patrick Steinhardt 48929d2e47 reftable/merged: make merged_iter structure private
The `merged_iter` structure is not used anywhere outside of "merged.c",
but is declared in its header. Move it into the code file so that it is
clear that its implementation details are never exposed to anything.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-04 10:19:30 -08:00
Patrick Steinhardt 5c11529c66 reftable/pq: use size_t to track iterator index
The reftable priority queue is used by the merged iterator to yield
records from its sub-iterators in the expected order. Each entry has a
record corresponding to such a sub-iterator as well as an index that
indicates which sub-iterator the record belongs to. But while the
sub-iterators are tracked with a `size_t`, we store the index as an
`int` in the entry.

Fix this and use `size_t` consistently.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-04 10:19:30 -08:00
Junio C Hamano 9f67cbd0a7 Merge branch 'ps/reftable-iteration-perf'
The code to iterate over refs with the reftable backend has seen
some optimization.

* ps/reftable-iteration-perf:
  reftable/reader: add comments to `table_iter_next()`
  reftable/record: don't try to reallocate ref record name
  reftable/block: swap buffers instead of copying
  reftable/pq: allocation-less comparison of entry keys
  reftable/merged: skip comparison for records of the same subiter
  reftable/merged: allocation-less dropping of shadowed records
  reftable/record: introduce function to compare records by key
2024-02-26 18:10:24 -08:00
Junio C Hamano a1cd814f1f Merge branch 'jc/comment-style-fixes' into maint-2.43
Rewrite //-comments to /* comments */ in files whose comments
prevalently use the latter.

* jc/comment-style-fixes:
  reftable/pq_test: comment style fix
  merge-ort.c: comment style fix
  builtin/worktree: comment style fixes
2024-02-13 14:44:48 -08:00
Junio C Hamano f424d7c33d Merge branch 'ps/reftable-styles'
Code clean-up in various reftable code paths.

* ps/reftable-styles:
  reftable/record: improve semantics when initializing records
  reftable/merged: refactor initialization of iterators
  reftable/merged: refactor seeking of records
  reftable/stack: use `size_t` to track stack length
  reftable/stack: use `size_t` to track stack slices during compaction
  reftable/stack: index segments with `size_t`
  reftable/stack: fix parameter validation when compacting range
  reftable: introduce macros to allocate arrays
  reftable: introduce macros to grow arrays
2024-02-12 13:16:10 -08:00
Junio C Hamano cf4a3bd8f1 Merge branch 'ps/reftable-multi-level-indices-fix'
Write multi-level indices for reftable has been corrected.

* ps/reftable-multi-level-indices-fix:
  reftable: document reading and writing indices
  reftable/writer: fix writing multi-level indices
  reftable/writer: simplify writing index records
  reftable/writer: use correct type to iterate through index entries
  reftable/reader: be more careful about errors in indexed seeks
2024-02-12 13:16:10 -08:00
Patrick Steinhardt c68ca7abd3 reftable/reader: add comments to table_iter_next()
While working on the optimizations in the preceding patches I stumbled
upon `table_iter_next()` multiple times. It is quite easy to miss the
fact that we don't call `table_iter_next_in_block()` twice, but that the
second call is in fact `table_iter_next_block()`.

Add comments to explain what exactly is going on here to make things
more obvious. While at it, touch up the code to conform to our code
style better.

Note that one of the refactorings merges two conditional blocks into
one. Before, we had the following code:

```
err = table_iter_next_block(&next, ti);
if (err != 0) {
	ti->is_finished = 1;
}
table_iter_block_done(ti);
if (err != 0) {
	return err;
}
```

As `table_iter_block_done()` does not care about `is_finished`, the
conditional blocks can be merged into one block:

```
err = table_iter_next_block(&next, ti);
table_iter_block_done(ti);
if (err != 0) {
	ti->is_finished = 1;
	return err;
}
```

This is both easier to reason about and more performant because we have
one branch less.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12 09:19:27 -08:00