Commit graph

22 commits

Author SHA1 Message Date
Junio C Hamano d25ad94df6 Merge branch 'ps/ci-test-with-jgit'
Tests to ensure interoperability between reftable written by jgit
and our code have been added and enabled in CI.

* ps/ci-test-with-jgit:
  t0612: add tests to exercise Git/JGit reftable compatibility
  t0610: fix non-portable variable assignment
  t06xx: always execute backend-specific tests
  ci: install JGit dependency
  ci: make Perforce binaries executable for all users
  ci: merge scripts which install dependencies
  ci: fix setup of custom path for GitLab CI
  ci: merge custom PATH directories
  ci: convert "install-dependencies.sh" to use "/bin/sh"
  ci: drop duplicate package installation for "linux-gcc-default"
  ci: skip sudo when we are already root
  ci: expose distro name in dockerized GitHub jobs
  ci: rename "runs_on_pool" to "distro"
2024-05-08 10:18:44 -07:00
Junio C Hamano 5aec7231c8 Merge branch 'ps/reftable-write-optim'
Code to write out reftable has seen some optimization and
simplification.

* ps/reftable-write-optim:
  reftable/block: reuse compressed array
  reftable/block: reuse zstream when writing log blocks
  reftable/writer: reset `last_key` instead of releasing it
  reftable/writer: unify releasing memory
  reftable/writer: refactorings for `writer_flush_nonempty_block()`
  reftable/writer: refactorings for `writer_add_record()`
  refs/reftable: don't recompute committer ident
  reftable: remove name checks
  refs/reftable: skip duplicate name checks
  refs/reftable: perform explicit D/F check when writing symrefs
  refs/reftable: fix D/F conflict error message on ref copy
2024-05-08 10:18:43 -07:00
Junio C Hamano 82a31ec324 Merge branch 'jt/reftable-geometric-compaction'
The strategy to compact multiple tables of reftables after many
operations accumulate many entries has been improved to avoid
accumulating too many tables uncollected.

* jt/reftable-geometric-compaction:
  reftable/stack: use geometric table compaction
  reftable/stack: add env to disable autocompaction
  reftable/stack: expose option to disable auto-compaction
2024-04-16 14:50:30 -07:00
Junio C Hamano 92e8388bd3 Merge branch 'jc/local-extern-shell-rules'
Document and apply workaround for a buggy version of dash that
mishandles "local var=val" construct.

* jc/local-extern-shell-rules:
  t1016: local VAR="VAL" fix
  t0610: local VAR="VAL" fix
  t: teach lint that RHS of 'local VAR=VAL' needs to be quoted
  t: local VAR="VAL" (quote ${magic-reference})
  t: local VAR="VAL" (quote command substitution)
  t: local VAR="VAL" (quote positional parameters)
  CodingGuidelines: quote assigned value in 'local var=$val'
  CodingGuidelines: describe "export VAR=VAL" rule
2024-04-16 14:50:27 -07:00
Junio C Hamano 372aabe912 Merge branch 'ps/t0610-umask-fix'
The "shared repository" test in the t0610 reftable test failed
under restrictive umask setting (e.g. 007), which has been
corrected.

* ps/t0610-umask-fix:
  t0610: execute git-pack-refs(1) with specified umask
  t0610: make `--shared=` tests reusable
2024-04-15 14:11:43 -07:00
Patrick Steinhardt db1d63bf57 t0610: fix non-portable variable assignment
Older versions of the Dash shell fail to parse `local var=val`
assignments in some cases when `val` is unquoted. Such failures can be
observed e.g. with Ubuntu 20.04 and older, which has a Dash version that
still has this bug.

Such an assignment has been introduced in t0610. The issue wasn't
detected for a while because this test used to only run when the
GIT_TEST_DEFAULT_REF_FORMAT environment variable was set to "reftable".
We have dropped that requirement now though, meaning that it runs
unconditionally, including on jobs which use such older versions of
Ubuntu.

We have worked around such issues in the past, e.g. in ebee5580ca
(parallel-checkout: avoid dash local bug in tests, 2021-06-06), by
quoting the `val` side. Apply the same fix to t0610.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-12 08:47:51 -07:00
Patrick Steinhardt ca13c3e94a t06xx: always execute backend-specific tests
The tests in t06xx exercise specific ref formats. Next to probing some
basic functionality, these tests also exercise other low-level details
specific to the format. Those tests are only executed though in case
`GIT_TEST_DEFAULT_REF_FORMAT` is set to the ref format of the respective
backend-under-test.

Ideally, we would run the full test matrix for ref formats such that our
complete test suite is executed with every supported format on every
supported platform. This is quite an expensive undertaking though, and
thus we only execute e.g. the "reftable" tests on macOS and Linux. As a
result, we basically have no test coverage for the "reftable" format at
all on other platforms like Windows.

Adapt these tests so that they override `GIT_TEST_DEFAULT_REF_FORMAT`,
which means that they'll always execute. This increases test coverage on
platforms that don't run the full test matrix, which at least gives us
some basic test coverage on those platforms for the "reftable" format.

This of course comes at the cost of running those tests multiple times
on platforms where we do run the full test matrix. But arguably, this is
a good thing because it will also cause us to e.g. run those tests with
the address sanitizer and other non-standard parameters.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-12 08:47:51 -07:00
Patrick Steinhardt 69d87802da t0610: execute git-pack-refs(1) with specified umask
The tests for git-pack-refs(1) with the `core.sharedRepository` config
execute git-pack-refs(1) outside of the shell that has the expected
umask set. This is wrong because we want to test the behaviour of that
command with different umasks. The issue went unnoticed because most
distributions have a default umask of 0022, and we only ever test with
`--shared=true`, which re-adds the group write bit.

Fix the issue by moving git-pack-refs(1) into the umask'd shell and add
a bunch of test cases that exercise behaviour more thoroughly.

Note that we drop the check for whether `core.sharedRepository` was set
to the correct value to make the test setup a bit easier. We should be
able to rely on git-init(1) doing its thing correctly. Furthermore, to
help readability, we convert tests that pass `--shared=true` to instead
pass the equivalent `--shared=group`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-09 14:14:00 -07:00
Patrick Steinhardt 2f960dd5fe t0610: make --shared= tests reusable
We have two kinds of `--shared=` tests, one for git-init(1) and one for
git-pack-refs(1). Merge them into a reusable function such that we can
easily add additional testcases with different umasks and flags for the
`--shared=` switch.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-09 14:14:00 -07:00
Patrick Steinhardt 455d61b6d2 refs/reftable: perform explicit D/F check when writing symrefs
We already perform explicit D/F checks in all reftable callbacks which
write refs, except when writing symrefs. For one this leads to an error
message which isn't perfectly actionable because we only tell the user
that there was a D/F conflict, but not which refs conflicted with each
other. But second, once all ref updating callbacks explicitly check for
D/F conflicts, we can disable the D/F checks in the reftable library
itself and thus avoid some duplicated efforts.

Refactor the code that writes symref tables to explicitly call into
`refs_verify_refname_available()` when writing symrefs.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-08 16:59:01 -07:00
Patrick Steinhardt f57cc987a9 refs/reftable: fix D/F conflict error message on ref copy
The `write_copy_table()` function is shared between the reftable
implementations for renaming and copying refs. The only difference
between those two cases is that the rename will also delete the old
reference, whereas copying won't.

This has resulted in a bug though where we don't properly verify refname
availability. When calling `refs_verify_refname_available()`, we always
add the old ref name to the list of refs to be skipped when computing
availability, which indicates that the name would be available even if
it already exists at the current point in time. This is only the right
thing to do for renames though, not for copies.

The consequence of this bug is quite harmless because the reftable
backend has its own checks for D/F conflicts further down in the call
stack, and thus we refuse the update regardless of the bug. But all the
user gets in this case is an uninformative message that copying the ref
has failed, without any further details.

Fix the bug and only add the old name to the skip-list in case we rename
the ref. Consequently, this error case will now be handled by
`refs_verify_refname_available()`, which knows to provide a proper error
message.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-08 16:59:01 -07:00
Justin Tobler a949ebd342 reftable/stack: use geometric table compaction
To reduce the number of on-disk reftables, compaction is performed.
Contiguous tables with the same binary log value of size are grouped
into segments. The segment that has both the lowest binary log value and
contains more than one table is set as the starting point when
identifying the compaction segment.

Since segments containing a single table are not initially considered
for compaction, if the table appended to the list does not match the
previous table log value, no compaction occurs for the new table. It is
therefore possible for unbounded growth of the table list. This can be
demonstrated by repeating the following sequence:

git branch -f foo
git branch -d foo

Each operation results in a new table being written with no compaction
occurring until a separate operation produces a table matching the
previous table log value.

Instead, to avoid unbounded growth of the table list, the compaction
strategy is updated to ensure tables follow a geometric sequence after
each operation by individually evaluating each table in reverse index
order. This strategy results in a much simpler and more robust algorithm
compared to the previous one while also maintaining a minimal ordered
set of tables on-disk.

When creating 10 thousand references, the new strategy has no
performance impact:

Benchmark 1: update-ref: create refs sequentially (revision = HEAD~)
  Time (mean ± σ):     26.516 s ±  0.047 s    [User: 17.864 s, System: 8.491 s]
  Range (min … max):   26.447 s … 26.569 s    10 runs

Benchmark 2: update-ref: create refs sequentially (revision = HEAD)
  Time (mean ± σ):     26.417 s ±  0.028 s    [User: 17.738 s, System: 8.500 s]
  Range (min … max):   26.366 s … 26.444 s    10 runs

Summary
  update-ref: create refs sequentially (revision = HEAD) ran
    1.00 ± 0.00 times faster than update-ref: create refs sequentially (revision = HEAD~)

Some tests in `t0610-reftable-basics.sh` assert the on-disk state of
tables and are therefore updated to specify the correct new table count.
Since compaction is more aggressive in ensuring tables maintain a
geometric sequence, the expected table count is reduced in these tests.
In `reftable/stack_test.c` tests related to `sizes_to_segments()` are
removed because the function is no longer needed. Also, the
`test_suggest_compaction_segment()` test is updated to better showcase
and reflect the new geometric compaction behavior.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-08 12:11:10 -07:00
Justin Tobler 7c8eb5928f reftable/stack: add env to disable autocompaction
In future tests it will be neccesary to create repositories with a set
number of tables. To make this easier, introduce the
`GIT_TEST_REFTABLE_AUTOCOMPACTION` environment variable that, when set
to false, disables autocompaction of reftables.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-08 12:11:10 -07:00
Junio C Hamano 26ba7477d9 t0610: local VAR="VAL" fix
The series was based on maint and fixes all the tests that exist
there, but we have acquired a few more.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-05 22:50:11 -07:00
Patrick Steinhardt 9f6714ab3e builtin/gc: pack refs when using git maintenance run --auto
When running `git maintenance run --auto`, then the various subtasks
will only run as needed. Thus, we for example end up only packing loose
objects if we hit a certain threshold.

Interestingly enough, the "pack-refs" task is actually _never_ executed
when the auto-flag is set because it does not have a condition at all.
As 41abfe15d9 (maintenance: add pack-refs task, 2021-02-09) mentions:

    The 'auto_condition' function pointer is left NULL for now. We could
    extend this in the future to have a condition check if pack-refs
    should be run during 'git maintenance run --auto'.

It is not quite clear from that quote whether it is actually intended
that the task doesn't run at all in this mode. Also, no test was added
to verify this behaviour. Ultimately though, it feels quite surprising
that `git maintenance run --auto --task=pack-refs` would quietly never
do anything at all.

In any case, now that we do have the logic in place to let ref backends
decide whether or not to repack refs, it does make sense to wire it up
accordingly. With the "reftable" backend we will thus now perform
auto-compaction, which optimizes the refdb as needed.

But for the "files" backend we now unconditionally pack refs as it does
not yet know to handle the "auto" flag. Arguably, this can be seen as a
bug fix given that previously the task never did anything at all.
Eventually though we should amend the "files" backend to use some
heuristics for auto compaction, as well.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-25 09:54:07 -07:00
Patrick Steinhardt bfc2f9eb8e builtin/gc: forward git-gc(1)'s --auto flag when packing refs
Forward the `--auto` flag to git-pack-refs(1) when it has been invoked
with this flag itself. This does not change anything for the "files"
backend, which will continue to eagerly pack refs. But it does ensure
that the "reftable" backend only compacts refs as required.

This change does not impact git-maintenance(1) because this command will
in fact never run the pack-refs task when run with `--auto`. This issue
will be addressed in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-25 09:54:07 -07:00
Patrick Steinhardt 6dcffc68f4 builtin/pack-refs: introduce new "--auto" flag
Calling git-pack-refs(1) will unconditionally cause it to pack all
requested refs regardless of the current state of the ref database. For
example:

  - With the "files" backend we will end up rewriting the complete
    "packed-refs" file even if only a single ref would require
    compaction.

  - With the "reftable" backend we will end up always compacting all
    tables into a single table.

This behaviour can be completely unnecessary depending on the backend
and is thus wasteful.

With the introduction of the `PACK_REFS_AUTO` flag in the preceding
commit we can improve this and let the backends decide for themselves
whether to pack refs in the first place. Expose this functionality via a
new "--auto" flag in git-pack-refs(1), which mirrors the same flag in
both git-gc(1) and git-maintenance(1).

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-25 09:54:07 -07:00
Patrick Steinhardt 4ccf7060d8 refs/reftable: print errors on compaction failure
When git-pack-refs(1) fails in the reftable backend we end up printing
no error message at all, leaving the caller puzzled as to why compaction
has failed. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-25 09:54:07 -07:00
Patrick Steinhardt a2f711ade0 reftable/stack: gracefully handle failed auto-compaction due to locks
Whenever we commit a new table to the reftable stack we will end up
invoking auto-compaction of the stack to keep the total number of tables
at bay. This auto-compaction may fail though in case at least one of the
tables which we are about to compact is locked. This is indicated by the
compaction function returning `REFTABLE_LOCK_ERROR`. We do not handle
this case though, and thus bubble that return value up the calling
chain, which will ultimately cause a failure.

Fix this bug by ignoring `REFTABLE_LOCK_ERROR`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-25 09:54:07 -07:00
Patrick Steinhardt e0795e2c79 t0610: remove unused variable assignment
In b0f6b6b523 (refs/reftable: don't fail empty transactions in repo
without HEAD, 2024-02-27), we have added a new test to t0610. This test
contains a useless assignment to a variable that is never actually used.
Remove it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-06 08:40:40 -08:00
Patrick Steinhardt b0f6b6b523 refs/reftable: don't fail empty transactions in repo without HEAD
Under normal circumstances, it shouldn't ever happen that a repository
has no HEAD reference. In fact, git-update-ref(1) would fail any request
to delete the HEAD reference, and a newly initialized repository always
pre-creates it, too.

We have however changed git-clone(1) to partially initialize the
refdb just up to the point where remote helpers can find the
repository. With that change, we are going to run into a situation
where repositories have no refs at all.

Now there is a very particular edge case in this situation: when
preparing an empty ref transacton, we end up returning whatever value
`read_ref_without_reload()` returned to the caller. Under normal
conditions this would be fine: "HEAD" should usually exist, and thus the
function would return `0`. But if "HEAD" doesn't exist, the function
returns a positive value which we end up returning to the caller.

Fix this bug by resetting the return code to `0` and add a test.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-27 13:53:39 -08:00
Patrick Steinhardt 57db2a094d refs: introduce reftable backend
Due to scalability issues, Shawn Pearce has originally proposed a new
"reftable" format more than six years ago [1]. Initially, this new
format was implemented in JGit with promising results. Around two years
ago, we have then added the "reftable" library to the Git codebase via
a4bbd13be3 (Merge branch 'hn/reftable', 2021-12-15). With this we have
landed all the low-level code to read and write reftables. Notably
missing though was the integration of this low-level code into the Git
code base in the form of a new ref backend that ties all of this
together.

This gap is now finally closed by introducing a new "reftable" backend
into the Git codebase. This new backend promises to bring some notable
improvements to Git repositories:

  - It becomes possible to do truly atomic writes where either all refs
    are committed to disk or none are. This was not possible with the
    "files" backend because ref updates were split across multiple loose
    files.

  - The disk space required to store many refs is reduced, both compared
    to loose refs and packed-refs. This is enabled both by the reftable
    format being a binary format, which is more compact, and by prefix
    compression.

  - We can ignore filesystem-specific behaviour as ref names are not
    encoded via paths anymore. This means there is no need to handle
    case sensitivity on Windows systems or Unicode precomposition on
    macOS.

  - There is no need to rewrite the complete refdb anymore every time a
    ref is being deleted like it was the case for packed-refs. This
    means that ref deletions are now constant time instead of scaling
    linearly with the number of refs.

  - We can ignore file/directory conflicts so that it becomes possible
    to store both "refs/heads/foo" and "refs/heads/foo/bar".

  - Due to this property we can retain reflogs for deleted refs. We have
    previously been deleting reflogs together with their refs to avoid
    file/directory conflicts, which is not necessary anymore.

  - We can properly enumerate all refs. With the "files" backend it is
    not easily possible to distinguish between refs and non-refs because
    they may live side by side in the gitdir.

Not all of these improvements are realized with the current "reftable"
backend implementation. At this point, the new backend is supposed to be
a drop-in replacement for the "files" backend that is used by basically
all Git repositories nowadays. It strives for 1:1 compatibility, which
means that a user can expect the same behaviour regardless of whether
they use the "reftable" backend or the "files" backend for most of the
part.

Most notably, this means we artificially limit the capabilities of the
"reftable" backend to match the limits of the "files" backend. It is not
possible to create refs that would end up with file/directory conflicts,
we do not retain reflogs, we perform stricter-than-necessary checks.
This is done intentionally due to two main reasons:

  - It makes it significantly easier to land the "reftable" backend as
    tests behave the same. It would be tough to argue for each and every
    single test that doesn't pass with the "reftable" backend.

  - It ensures compatibility between repositories that use the "files"
    backend and repositories that use the "reftable" backend. Like this,
    hosters can migrate their repositories to use the "reftable" backend
    without causing issues for clients that use the "files" backend in
    their clones.

It is expected that these artificial limitations may eventually go away
in the long term.

Performance-wise things very much depend on the actual workload. The
following benchmarks compare the "files" and "reftable" backends in the
current version:

  - Creating N refs in separate transactions shows that the "files"
    backend is ~50% faster. This is not surprising given that creating a
    ref only requires us to create a single loose ref. The "reftable"
    backend will also perform auto compaction on updates. In real-world
    workloads we would likely also want to perform pack loose refs,
    which would likely change the picture.

        Benchmark 1: update-ref: create refs sequentially (refformat = files, refcount = 1)
          Time (mean ± σ):       2.1 ms ±   0.3 ms    [User: 0.6 ms, System: 1.7 ms]
          Range (min … max):     1.8 ms …   4.3 ms    133 runs

        Benchmark 2: update-ref: create refs sequentially (refformat = reftable, refcount = 1)
          Time (mean ± σ):       2.7 ms ±   0.1 ms    [User: 0.6 ms, System: 2.2 ms]
          Range (min … max):     2.4 ms …   2.9 ms    132 runs

        Benchmark 3: update-ref: create refs sequentially (refformat = files, refcount = 1000)
          Time (mean ± σ):      1.975 s ±  0.006 s    [User: 0.437 s, System: 1.535 s]
          Range (min … max):    1.969 s …  1.980 s    3 runs

        Benchmark 4: update-ref: create refs sequentially (refformat = reftable, refcount = 1000)
          Time (mean ± σ):      2.611 s ±  0.013 s    [User: 0.782 s, System: 1.825 s]
          Range (min … max):    2.597 s …  2.622 s    3 runs

        Benchmark 5: update-ref: create refs sequentially (refformat = files, refcount = 100000)
          Time (mean ± σ):     198.442 s ±  0.241 s    [User: 43.051 s, System: 155.250 s]
          Range (min … max):   198.189 s … 198.670 s    3 runs

        Benchmark 6: update-ref: create refs sequentially (refformat = reftable, refcount = 100000)
          Time (mean ± σ):     294.509 s ±  4.269 s    [User: 104.046 s, System: 190.326 s]
          Range (min … max):   290.223 s … 298.761 s    3 runs

  - Creating N refs in a single transaction shows that the "files"
    backend is significantly slower once we start to write many refs.
    The "reftable" backend only needs to update two files, whereas the
    "files" backend needs to write one file per ref.

        Benchmark 1: update-ref: create many refs (refformat = files, refcount = 1)
          Time (mean ± σ):       1.9 ms ±   0.1 ms    [User: 0.4 ms, System: 1.4 ms]
          Range (min … max):     1.8 ms …   2.6 ms    151 runs

        Benchmark 2: update-ref: create many refs (refformat = reftable, refcount = 1)
          Time (mean ± σ):       2.5 ms ±   0.1 ms    [User: 0.7 ms, System: 1.7 ms]
          Range (min … max):     2.4 ms …   3.4 ms    148 runs

        Benchmark 3: update-ref: create many refs (refformat = files, refcount = 1000)
          Time (mean ± σ):     152.5 ms ±   5.2 ms    [User: 19.1 ms, System: 133.1 ms]
          Range (min … max):   148.5 ms … 167.8 ms    15 runs

        Benchmark 4: update-ref: create many refs (refformat = reftable, refcount = 1000)
          Time (mean ± σ):      58.0 ms ±   2.5 ms    [User: 28.4 ms, System: 29.4 ms]
          Range (min … max):    56.3 ms …  72.9 ms    40 runs

        Benchmark 5: update-ref: create many refs (refformat = files, refcount = 1000000)
          Time (mean ± σ):     152.752 s ±  0.710 s    [User: 20.315 s, System: 131.310 s]
          Range (min … max):   152.165 s … 153.542 s    3 runs

        Benchmark 6: update-ref: create many refs (refformat = reftable, refcount = 1000000)
          Time (mean ± σ):     51.912 s ±  0.127 s    [User: 26.483 s, System: 25.424 s]
          Range (min … max):   51.769 s … 52.012 s    3 runs

  - Deleting a ref in a fully-packed repository shows that the "files"
    backend scales with the number of refs. The "reftable" backend has
    constant-time deletions.

        Benchmark 1: update-ref: delete ref (refformat = files, refcount = 1)
          Time (mean ± σ):       1.7 ms ±   0.1 ms    [User: 0.4 ms, System: 1.2 ms]
          Range (min … max):     1.6 ms …   2.1 ms    316 runs

        Benchmark 2: update-ref: delete ref (refformat = reftable, refcount = 1)
          Time (mean ± σ):       1.8 ms ±   0.1 ms    [User: 0.4 ms, System: 1.3 ms]
          Range (min … max):     1.7 ms …   2.1 ms    294 runs

        Benchmark 3: update-ref: delete ref (refformat = files, refcount = 1000)
          Time (mean ± σ):       2.0 ms ±   0.1 ms    [User: 0.5 ms, System: 1.4 ms]
          Range (min … max):     1.9 ms …   2.5 ms    287 runs

        Benchmark 4: update-ref: delete ref (refformat = reftable, refcount = 1000)
          Time (mean ± σ):       1.9 ms ±   0.1 ms    [User: 0.5 ms, System: 1.3 ms]
          Range (min … max):     1.8 ms …   2.1 ms    217 runs

        Benchmark 5: update-ref: delete ref (refformat = files, refcount = 1000000)
          Time (mean ± σ):     229.8 ms ±   7.9 ms    [User: 182.6 ms, System: 46.8 ms]
          Range (min … max):   224.6 ms … 245.2 ms    6 runs

        Benchmark 6: update-ref: delete ref (refformat = reftable, refcount = 1000000)
          Time (mean ± σ):       2.0 ms ±   0.0 ms    [User: 0.6 ms, System: 1.3 ms]
          Range (min … max):     2.0 ms …   2.1 ms    3 runs

  - Listing all refs shows no significant advantage for either of the
    backends. The "files" backend is a bit faster, but not by a
    significant margin. When repositories are not packed the "reftable"
    backend outperforms the "files" backend because the "reftable"
    backend performs auto-compaction.

        Benchmark 1: show-ref: print all refs (refformat = files, refcount = 1, packed = true)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.0 ms    1729 runs

        Benchmark 2: show-ref: print all refs (refformat = reftable, refcount = 1, packed = true)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   1.8 ms    1816 runs

        Benchmark 3: show-ref: print all refs (refformat = files, refcount = 1000, packed = true)
          Time (mean ± σ):       4.3 ms ±   0.1 ms    [User: 0.9 ms, System: 3.3 ms]
          Range (min … max):     4.1 ms …   4.6 ms    645 runs

        Benchmark 4: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = true)
          Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 1.0 ms, System: 3.3 ms]
          Range (min … max):     4.2 ms …   5.9 ms    643 runs

        Benchmark 5: show-ref: print all refs (refformat = files, refcount = 1000000, packed = true)
          Time (mean ± σ):      2.537 s ±  0.034 s    [User: 0.488 s, System: 2.048 s]
          Range (min … max):    2.511 s …  2.627 s    10 runs

        Benchmark 6: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = true)
          Time (mean ± σ):      2.712 s ±  0.017 s    [User: 0.653 s, System: 2.059 s]
          Range (min … max):    2.692 s …  2.752 s    10 runs

        Benchmark 7: show-ref: print all refs (refformat = files, refcount = 1, packed = false)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   1.9 ms    1834 runs

        Benchmark 8: show-ref: print all refs (refformat = reftable, refcount = 1, packed = false)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.4 ms …   2.0 ms    1840 runs

        Benchmark 9: show-ref: print all refs (refformat = files, refcount = 1000, packed = false)
          Time (mean ± σ):      13.8 ms ±   0.2 ms    [User: 2.8 ms, System: 10.8 ms]
          Range (min … max):    13.3 ms …  14.5 ms    208 runs

        Benchmark 10: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = false)
          Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 1.2 ms, System: 3.3 ms]
          Range (min … max):     4.3 ms …   6.2 ms    624 runs

        Benchmark 11: show-ref: print all refs (refformat = files, refcount = 1000000, packed = false)
          Time (mean ± σ):     12.127 s ±  0.129 s    [User: 2.675 s, System: 9.451 s]
          Range (min … max):   11.965 s … 12.370 s    10 runs

        Benchmark 12: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = false)
          Time (mean ± σ):      2.799 s ±  0.022 s    [User: 0.735 s, System: 2.063 s]
          Range (min … max):    2.769 s …  2.836 s    10 runs

  - Printing a single ref shows no real difference between the "files"
    and "reftable" backends.

        Benchmark 1: show-ref: print single ref (refformat = files, refcount = 1)
          Time (mean ± σ):       1.5 ms ±   0.1 ms    [User: 0.4 ms, System: 1.0 ms]
          Range (min … max):     1.4 ms …   1.8 ms    1779 runs

        Benchmark 2: show-ref: print single ref (refformat = reftable, refcount = 1)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.4 ms …   2.5 ms    1753 runs

        Benchmark 3: show-ref: print single ref (refformat = files, refcount = 1000)
          Time (mean ± σ):       1.5 ms ±   0.1 ms    [User: 0.3 ms, System: 1.1 ms]
          Range (min … max):     1.4 ms …   1.9 ms    1840 runs

        Benchmark 4: show-ref: print single ref (refformat = reftable, refcount = 1000)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.0 ms    1831 runs

        Benchmark 5: show-ref: print single ref (refformat = files, refcount = 1000000)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.1 ms    1848 runs

        Benchmark 6: show-ref: print single ref (refformat = reftable, refcount = 1000000)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.1 ms    1762 runs

So overall, performance depends on the usecases. Except for many
sequential writes the "reftable" backend is roughly on par or
significantly faster than the "files" backend though. Given that the
"files" backend has received 18 years of optimizations by now this can
be seen as a win. Furthermore, we can expect that the "reftable" backend
will grow faster over time when attention turns more towards
optimizations.

The complete test suite passes, except for those tests explicitly marked
to require the REFFILES prerequisite. Some tests in t0610 are marked as
failing because they depend on still-in-flight bug fixes. Tests can be
run with the new backend by setting the GIT_TEST_DEFAULT_REF_FORMAT
environment variable to "reftable".

There is a single known conceptual incompatibility with the dumb HTTP
transport. As "info/refs" SHOULD NOT contain the HEAD reference, and
because the "HEAD" file is not valid anymore, it is impossible for the
remote client to figure out the default branch without changing the
protocol. This shortcoming needs to be handled in a subsequent patch
series.

As the reftable library has already been introduced a while ago, this
commit message will not go into the details of how exactly the on-disk
format works. Please refer to our preexisting technical documentation at
Documentation/technical/reftable for this.

[1]: https://public-inbox.org/git/CAJo=hJtyof=HRy=2sLP0ng0uZ4=S-DpZ5dR1aF+VHVETKG20OQ@mail.gmail.com/

Original-idea-by: Shawn Pearce <spearce@spearce.org>
Based-on-patch-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-07 08:28:37 -08:00