development/git - HydraGit

mirror of https://github.com/git/git synced 2024-10-02 14:45:21 +00:00

Author	SHA1	Message	Date
Patrick Steinhardt	23c32511b3	refs/reftable: stop micro-optimizing refname allocations on copy When copying refs, we execute `write_copy_table()` to write the new table. As the names are given to us via `arg->newname` and `arg->oldname`, respectively, we optimize away some allocations by assigning those fields to the reftable records we are about to write directly, without duplicating them. This requires us to cast the input to `char ` pointers as they are in fact constant strings. Later on, we then unset the refname for all of the records before calling `reftable_log_record_release()` on them. We also do this when assigning the "HEAD" constant, but here we do not cast because its type is `char[]` by default. It's about to be turned into `const char ` though once we enable `-Wwrite-strings` and will thus cause another warning. It's quite dubious whether this micro-optimization really helps. We're about to write to disk anyway, which is going to be way slower than a small handful of allocations. Let's drop the optimization altogther and instead copy arguments to simplify the code and avoid the future warning with `-Wwrite-strings`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-06-07 10:30:48 -07:00
Junio C Hamano	16a592f132	Merge branch 'ps/pseudo-ref-terminology' Terminology to call various ref-like things are getting straightened out. * ps/pseudo-ref-terminology: refs: refuse to write pseudorefs ref-filter: properly distinuish pseudo and root refs refs: pseudorefs are no refs refs: classify HEAD as a root ref refs: do not check ref existence in `is_root_ref()` refs: rename `is_special_ref()` to `is_pseudo_ref()` refs: rename `is_pseudoref()` to `is_root_ref()` Documentation/glossary: define root refs as refs Documentation/glossary: clarify limitations of pseudorefs Documentation/glossary: redefine pseudorefs as special refs	2024-05-28 11:17:06 -07:00
Junio C Hamano	86a49253a6	Merge branch 'it/refs-name-conflict' Expose "name conflict" error when a ref creation fails due to D/F conflict in the ref namespace, to improve an error message given by "git fetch". * it/refs-name-conflict: refs: return conflict error when checking packed refs	2024-05-23 11:04:27 -07:00
Junio C Hamano	4beb7a3b06	Merge branch 'kn/ref-transaction-symref' Updates to symbolic refs can now be made as a part of ref transaction. * kn/ref-transaction-symref: refs: remove `create_symref` and associated dead code refs: rename `refs_create_symref()` to `refs_update_symref()` refs: use transaction in `refs_create_symref()` refs: add support for transactional symref updates refs: move `original_update_refname` to 'refs.c' refs: support symrefs in 'reference-transaction' hook files-backend: extract out `create_symref_lock()` refs: accept symref values in `ref_transaction_update()`	2024-05-20 11:20:04 -07:00
Patrick Steinhardt	31951c2248	refs: classify HEAD as a root ref Root refs are those refs that live in the root of the ref hierarchy. Our old and venerable "HEAD" reference falls into this category, but we don't yet classify it as such in `is_root_ref()`. Adapt the function to also treat "HEAD" as a root ref. This change is safe to do for all current callers: - `ref_kind_from_refname()` already handles "HEAD" explicitly before calling `is_root_ref()`. - The "files" and "reftable" backends explicitly call both `is_root_ref()` and `is_headref()` together. This also aligns behaviour or `is_root_ref()` and `is_headref()` such that we stop checking for ref existence. This changes semantics for our backends: - In the reftable backend we already know that the ref must exist because `is_headref()` is called as part of the ref iterator. The existence check is thus redundant, and the change is safe to do. - In the files backend we use it when populating root refs, where we would skip adding the "HEAD" file if it was not possible to resolve it. The new behaviour is to instead mark "HEAD" as broken, which will cause us to emit warnings in various places. As there are no callers of `is_headref()` left afer the refactoring, we can absorb it completely into `is_root_ref()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-15 07:30:52 -07:00
Patrick Steinhardt	afcd067dad	refs: do not check ref existence in `is_root_ref()` Before this patch series, root refs except for "HEAD" and our special refs were classified as pseudorefs. Furthermore, our terminology clarified that pseudorefs must not be symbolic refs. This restriction is enforced in `is_root_ref()`, which explicitly checks that a supposed root ref resolves to an object ID without recursing. This has been extremely confusing right from the start because (in old terminology) a ref name may sometimes be a pseudoref and sometimes not depending on whether it is a symbolic or regular ref. This behaviour does not seem reasonable at all and I very much doubt that it results in anything sane. Last but not least, the current behaviour can actually lead to a segfault when calling `is_root_ref()` with a reference that either does not exist or that is a symbolic ref because we never initialized `oid`, but then read it via `is_null_oid()`. We have now changed terminology to clarify that pseudorefs are really only "MERGE_HEAD" and "FETCH_HEAD", whereas all the other refs that live in the root of the ref hierarchy are just plain refs. Thus, we do not need to check whether the ref is symbolic or not. In fact, we can now avoid looking up the ref completely as the name is sufficient for us to figure out whether something would be a root ref or not. This change of course changes semantics for our callers. As there are only three of them we can assess each of them individually: - "ref-filter.c:ref_kind_from_refname()" uses it to classify refs. It's clear that the intent is to classify based on the ref name, only. - "refs/reftable_backend.c:reftable_ref_iterator_advance()" uses it to filter root refs. Again, using existence checks is pointless here as the iterator has just surfaced the ref, so we know it does exist. - "refs/files_backend.c:add_pseudoref_and_head_entries()" uses it to determine whether it should add a ref to the root directory of its iterator. This had the effect that we skipped over any files that are either a symbolic ref, or which are not a ref at all. The new behaviour is to include symbolic refs know, which aligns us with the adapted terminology. Furthermore, files which look like root refs but aren't are now mark those as "broken". As broken refs are not surfaced by our tooling, this should not lead to a change in user-visible behaviour, but may cause us to emit warnings. This feels like the right thing to do as we would otherwise just silently ignore corrupted root refs completely. So in all cases the existence check was either superfluous, not in line with the adapted terminology or masked potential issues. This commit thus changes the behaviour as proposed and drops the existence check altogether. Add a test that verifies that this does not change user-visible behaviour. Namely, we still don't want to show broken refs to the user by default in git-for-each-ref(1). What this does allow though is for internal callers to surface dangling root refs when they pass in the `DO_FOR_EACH_INCLUDE_BROKEN` flag. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-15 07:30:52 -07:00
Patrick Steinhardt	f6936e62a5	refs: rename `is_pseudoref()` to `is_root_ref()` Rename `is_pseudoref()` to `is_root_ref()` to adapt to the newly defined terminology in our gitglossary(7). Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-15 07:30:51 -07:00
Junio C Hamano	5aec7231c8	Merge branch 'ps/reftable-write-optim' Code to write out reftable has seen some optimization and simplification. * ps/reftable-write-optim: reftable/block: reuse compressed array reftable/block: reuse zstream when writing log blocks reftable/writer: reset `last_key` instead of releasing it reftable/writer: unify releasing memory reftable/writer: refactorings for `writer_flush_nonempty_block()` reftable/writer: refactorings for `writer_add_record()` refs/reftable: don't recompute committer ident reftable: remove name checks refs/reftable: skip duplicate name checks refs/reftable: perform explicit D/F check when writing symrefs refs/reftable: fix D/F conflict error message on ref copy	2024-05-08 10:18:43 -07:00
Karthik Nayak	4865707bda	refs: remove `create_symref` and associated dead code In the previous commits, we converted `refs_create_symref()` to utilize transactions to perform symref updates. Earlier `refs_create_symref()` used `create_symref()` to do the same. We can now remove `create_symref()` and any code associated with it which is no longer used. We remove `create_symref()` code from all the reference backends and also remove it entirely from the `ref_storage_be` struct. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-07 08:51:50 -07:00
Karthik Nayak	644daf7785	refs: add support for transactional symref updates The reference backends currently support transactional reference updates. While this is exposed to users via 'git-update-ref' and its '--stdin' mode, it is also used internally within various commands. However, we do not support transactional updates of symrefs. This commit adds support for symrefs in both the 'files' and the 'reftable' backend. Here, we add and use `ref_update_has_null_new_value()`, a helper function which is used to check if there is a new_value in a reference update. The new value could either be a symref target `new_target` or a OID `new_oid`. We also add another common function `ref_update_check_old_target` which will be used to check if the update's old_target corresponds to a reference's current target. Now transactional updates (verify, create, delete, update) can be used for: - regular refs - symbolic refs - conversion of regular to symbolic refs and vice versa This also allows us to expose this to users via new commands in 'git-update-ref' in the future. Note that a dangling symref update does not record a new reflog entry, which is unchanged before and after this commit. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-07 08:51:49 -07:00
Karthik Nayak	e9965ba477	refs: move `original_update_refname` to 'refs.c' The files backend and the reftable backend implement `original_update_refname` to obtain the original refname of the update. Move it out to 'refs.c' and only expose it internally to the refs library. This will be used in an upcoming commit to also introduce another common functionality for the two backends. We also rename the function to `ref_update_original_update_refname` to keep it consistent with the upcoming other 'ref_update_*' functions that'll be introduced. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-07 08:51:49 -07:00
Karthik Nayak	57d0b1e2ea	files-backend: extract out `create_symref_lock()` The function `create_symref_locked()` creates a symref by creating a '<symref>.lock' file and then committing the symref lock, which creates the final symref. Extract the early half of `create_symref_locked()` into a new helper function `create_symref_lock()`. Because the name of the new function is too similar to the original, rename the original to `create_and_commit_symref()` to avoid confusion. The new function `create_symref_locked()` can be used to create the symref lock in a separate step from that of committing it. This allows to add transactional support for symrefs, where the lock would be created in the preparation step and the lock would be committed in the finish step. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-07 08:51:49 -07:00
Karthik Nayak	1bc4cc3fc4	refs: accept symref values in `ref_transaction_update()` The function `ref_transaction_update()` obtains ref information and flags to create a `ref_update` and add them to the transaction at hand. To extend symref support in transactions, we need to also accept the old and new ref targets and process it. This commit adds the required parameters to the function and modifies all call sites. The two parameters added are `new_target` and `old_target`. The `new_target` is used to denote what the reference should point to when the transaction is applied. Some functions allow this parameter to be NULL, meaning that the reference is not changed. The `old_target` denotes the value the reference must have before the update. Some functions allow this parameter to be NULL, meaning that the old value of the reference is not checked. We also update the internal function `ref_transaction_add_update()` similarly to take the two new parameters. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-07 08:51:49 -07:00
Ivan Tse	9339fca23e	refs: return conflict error when checking packed refs The TRANSACTION_NAME_CONFLICT error code refers to a failure to create a ref due to a name conflict with another ref. An example of this is a directory/file conflict such as ref names A/B and A. "git fetch" uses this error code to more accurately describe the error by recommending to the user that they try running "git remote prune" to remove any old refs that are deleted by the remote which would clear up any directory/file conflicts. This helpful error message is not displayed when the conflicted ref is stored in packed refs. This change fixes this by ensuring error return code consistency in `lock_raw_ref`. Signed-off-by: Ivan Tse <ivan.tse1@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-05-06 08:48:25 -07:00
Junio C Hamano	eacfd581d2	Merge branch 'ps/pack-refs-auto' "git pack-refs" learned the "--auto" option, which is a useful addition to be triggered from "git gc --auto". Acked-by: Karthik Nayak <karthik.188@gmail.com> cf. <CAOLa=ZRAEA7rSUoYL0h-2qfEELdbPHbeGpgBJRqesyhHi9Q6WQ@mail.gmail.com> * ps/pack-refs-auto: builtin/gc: pack refs when using `git maintenance run --auto` builtin/gc: forward git-gc(1)'s `--auto` flag when packing refs t6500: extract objects with "17" prefix builtin/gc: move `struct maintenance_run_opts` builtin/pack-refs: introduce new "--auto" flag builtin/pack-refs: release allocated memory refs/reftable: expose auto compaction via new flag refs: remove `PACK_REFS_ALL` flag refs: move `struct pack_refs_opts` to where it's used t/helper: drop pack-refs wrapper refs/reftable: print errors on compaction failure reftable/stack: gracefully handle failed auto-compaction due to locks reftable/stack: use error codes when locking fails during compaction reftable/error: discern locked/outdated errors reftable/stack: fix error handling in `reftable_stack_init_addition()`	2024-04-09 14:31:45 -07:00
Patrick Steinhardt	44afd85fbd	refs/reftable: don't recompute committer ident In order to write reflog entries we need to compute the committer's identity as it gets encoded in the log record itself. The reftable backend does this via `git_committer_info()` and `split_ident_line()` in `fill_reftable_log_record()`, which use the Git config as well as environment variables to figure out the identity. While most callers would only call `fill_reftable_log_record()` once or twice, `write_transaction_table()` will call it as many times as there are queued ref updates. This can be quite a waste of effort when writing many refs with reflog entries in a single transaction. Refactor the code to pre-compute the committer information. This results in a small speedup when writing 100000 refs in a single transaction: Benchmark 1: update-ref: create many refs (HEAD~) Time (mean ± σ): 2.895 s ± 0.020 s [User: 1.516 s, System: 1.374 s] Range (min … max): 2.868 s … 2.983 s 100 runs Benchmark 2: update-ref: create many refs (HEAD) Time (mean ± σ): 2.845 s ± 0.017 s [User: 1.461 s, System: 1.379 s] Range (min … max): 2.803 s … 2.913 s 100 runs Summary update-ref: create many refs (HEAD) ran 1.02 ± 0.01 times faster than update-ref: create many refs (HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-08 17:01:41 -07:00
Patrick Steinhardt	485c63cf5c	reftable: remove name checks In the preceding commit we have disabled name checks in the "reftable" backend. These checks were responsible for verifying multiple things when writing records to the reftable stack: - Detecting file/directory conflicts. Starting with the preceding commits this is now handled by the reftable backend itself via `refs_verify_refname_available()`. - Validating refnames. This is handled by `check_refname_format()` in the generic ref transacton layer. The code in the reftable library is thus not used anymore and likely to bitrot over time. Remove it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-08 17:01:41 -07:00
Patrick Steinhardt	4af31dc84a	refs/reftable: skip duplicate name checks All the callback functions which write refs in the reftable backend perform D/F conflict checks via `refs_verify_refname_available()`. But in reality we perform these D/F conflict checks a second time in the reftable library via `stack_check_addition()`. Interestingly, the code in the reftable library is inferior compared to the generic function: - It is slower than `refs_verify_refname_available()`, even though this can probably be optimized. - It does not provide a proper error message to the caller, and thus all the user would see is a generic "file/directory conflict" message. Disable the D/F conflict checks in the reftable library by setting the `skip_name_check` write option. This results in a non-negligible speedup when writing many refs. The following benchmark writes 100k refs in a single transaction: Benchmark 1: update-ref: create many refs (HEAD~) Time (mean ± σ): 3.241 s ± 0.040 s [User: 1.854 s, System: 1.381 s] Range (min … max): 3.185 s … 3.454 s 100 runs Benchmark 2: update-ref: create many refs (HEAD) Time (mean ± σ): 2.878 s ± 0.024 s [User: 1.506 s, System: 1.367 s] Range (min … max): 2.838 s … 2.960 s 100 runs Summary update-ref: create many refs (HEAD~) ran 1.13 ± 0.02 times faster than update-ref: create many refs (HEAD) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-08 16:59:02 -07:00
Patrick Steinhardt	455d61b6d2	refs/reftable: perform explicit D/F check when writing symrefs We already perform explicit D/F checks in all reftable callbacks which write refs, except when writing symrefs. For one this leads to an error message which isn't perfectly actionable because we only tell the user that there was a D/F conflict, but not which refs conflicted with each other. But second, once all ref updating callbacks explicitly check for D/F conflicts, we can disable the D/F checks in the reftable library itself and thus avoid some duplicated efforts. Refactor the code that writes symref tables to explicitly call into `refs_verify_refname_available()` when writing symrefs. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-08 16:59:01 -07:00
Patrick Steinhardt	f57cc987a9	refs/reftable: fix D/F conflict error message on ref copy The `write_copy_table()` function is shared between the reftable implementations for renaming and copying refs. The only difference between those two cases is that the rename will also delete the old reference, whereas copying won't. This has resulted in a bug though where we don't properly verify refname availability. When calling `refs_verify_refname_available()`, we always add the old ref name to the list of refs to be skipped when computing availability, which indicates that the name would be available even if it already exists at the current point in time. This is only the right thing to do for renames though, not for copies. The consequence of this bug is quite harmless because the reftable backend has its own checks for D/F conflicts further down in the call stack, and thus we refuse the update regardless of the bug. But all the user gets in this case is an uninformative message that copying the ref has failed, without any further details. Fix the bug and only add the old name to the skip-list in case we rename the ref. Consequently, this error case will now be handled by `refs_verify_refname_available()`, which knows to provide a proper error message. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-08 16:59:01 -07:00
Justin Tobler	7c8eb5928f	reftable/stack: add env to disable autocompaction In future tests it will be neccesary to create repositories with a set number of tables. To make this easier, introduce the `GIT_TEST_REFTABLE_AUTOCOMPACTION` environment variable that, when set to false, disables autocompaction of reftables. Signed-off-by: Justin Tobler <jltobler@gmail.com> Acked-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-04-08 12:11:10 -07:00
Junio C Hamano	7424fb7797	Merge branch 'ps/pack-refs-auto' into jt/reftable-geometric-compaction * ps/pack-refs-auto: builtin/gc: pack refs when using `git maintenance run --auto` builtin/gc: forward git-gc(1)'s `--auto` flag when packing refs t6500: extract objects with "17" prefix builtin/gc: move `struct maintenance_run_opts` builtin/pack-refs: introduce new "--auto" flag builtin/pack-refs: release allocated memory refs/reftable: expose auto compaction via new flag refs: remove `PACK_REFS_ALL` flag refs: move `struct pack_refs_opts` to where it's used t/helper: drop pack-refs wrapper refs/reftable: print errors on compaction failure reftable/stack: gracefully handle failed auto-compaction due to locks reftable/stack: use error codes when locking fails during compaction reftable/error: discern locked/outdated errors reftable/stack: fix error handling in `reftable_stack_init_addition()`	2024-04-05 10:34:23 -07:00
Patrick Steinhardt	f89356db4a	refs/reftable: expose auto compaction via new flag Under normal circumstances, the "reftable" backend will automatically perform compaction after appending to the stack. It is thus not necessary and may even be considered wasteful to run git-pack-refs(1) in "reftable"-backed repositories as it will cause the backend to compact all tables into a single one. We do exactly that though when running `git maintenance run --auto` or `git gc --auto`, which gets spawned by Git after running some specific commands. The `--auto` mode is typically only executing optimizations as needed. To do so, we already use several heuristics for the various different data structures in Git to determine whether to optimize them or not. We do not use any heuristics for refs though and instead always optimize them. Introduce a new `PACK_REFS_AUTO` flag that can be passed to the backend. When not handled by the backend we will continue to behave the exact same as we do right now, that is we optimize refs unconditionally. This is done for the "files" backend for now to retain current behaviour, even though we may eventually also want to introduce heuristics here. For the "reftable" backend though we already do have auto-compaction, so we can easily reuse that logic to implement the new auto-packing flag. Note that under normal circumstances, this should always end up being a no-op. After all, we already invoke the code for every single addition to the stack. But there are special cases where it can still be helpful to execute the auto-compaction code explicitly: - Concurrent writers may cause compaction to not run due to locks. - Callers may decide to disable compaction altogether and then pack refs at a later point due to various reasons. - Other implementations of the reftable format may do compaction differently or even not at all. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-03-25 09:54:07 -07:00
Patrick Steinhardt	4ccf7060d8	refs/reftable: print errors on compaction failure When git-pack-refs(1) fails in the reftable backend we end up printing no error message at all, leaving the caller puzzled as to why compaction has failed. Fix this. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-03-25 09:54:07 -07:00
Junio C Hamano	e8c1cda9a9	Merge branch 'ps/reftable-reflog-iteration-perf' The code to iterate over reflogs in the reftable has been optimized to reduce memory allocation and deallocation. Reviewed-by: Josh Steadmon <steadmon@google.com> cf. <Ze9eX-aaWoVaqsPP@google.com> * ps/reftable-reflog-iteration-perf: refs/reftable: track last log record name via strbuf reftable/record: use scratch buffer when decoding records reftable/record: reuse message when decoding log records reftable/record: reuse refnames when decoding log records reftable/record: avoid copying author info reftable/record: convert old and new object IDs to arrays refs/reftable: reload correct stack when creating reflog iter	2024-03-21 14:55:13 -07:00
Junio C Hamano	448a74e151	Merge branch 'ps/reftable-iteration-perf-part2' The code to iterate over refs with the reftable backend has seen some optimization. * ps/reftable-iteration-perf-part2: refs/reftable: precompute prefix length reftable: allow inlining of a few functions reftable/record: decode keys in place reftable/record: reuse refname when copying reftable/record: reuse refname when decoding reftable/merged: avoid duplicate pqueue emptiness check reftable/merged: circumvent pqueue with single subiter reftable/merged: handle subiter cleanup on close only reftable/merged: remove unnecessary null check for subiters reftable/merged: make subiters own their records reftable/merged: advance subiter on subsequent iteration reftable/merged: make `merged_iter` structure private reftable/pq: use `size_t` to track iterator index	2024-03-14 14:05:23 -07:00
Junio C Hamano	963a277a52	Merge branch 'ps/reftable-repo-init-fix' Clear the fallout from a fix for 2.44 regression. * ps/reftable-repo-init-fix: t0610: remove unused variable assignment refs/reftable: don't fail empty transactions in repo without HEAD	2024-03-07 15:59:42 -08:00
Junio C Hamano	d037212d97	Merge branch 'kn/for-all-refs' "git for-each-ref" learned "--include-root-refs" option to show even the stuff outside the 'refs/' hierarchy. * kn/for-all-refs: for-each-ref: add new option to include root refs ref-filter: rename 'FILTER_REFS_ALL' to 'FILTER_REFS_REGULAR' refs: introduce `refs_for_each_include_root_refs()` refs: extract out `loose_fill_ref_dir_regular_file()` refs: introduce `is_pseudoref()` and `is_headref()`	2024-03-05 09:44:44 -08:00
Patrick Steinhardt	fcacc2b161	refs/reftable: track last log record name via strbuf The reflog iterator enumerates all reflogs known to a ref backend. In the "reftable" backend there is no way to list all existing reflogs directly. Instead, we have to iterate through all reflog entries and discard all those redundant entries for which we have already returned a reflog entry. This logic is implemented by tracking the last reflog name that we have emitted to the iterator's user. If the next log record has the same name we simply skip it until we find another record with a different refname. This last reflog name is stored in a simple C string, which requires us to free and reallocate it whenever we need to update the reflog name. Convert it to use a `struct strbuf` instead, which reduces the number of allocations. Before: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 1,068,485 allocs, 1,068,363 frees, 281,122,886 bytes allocated After: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 68,485 allocs, 68,363 frees, 256,234,072 bytes allocated Note that even after this change we still allocate quite a lot of data, even though the number of allocations does not scale with the number of log records anymore. This remainder comes mostly from decompressing the log blocks, where we decompress each block into newly allocated memory. This will be addressed at a later point in time. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-03-05 09:10:07 -08:00
Patrick Steinhardt	87ff723018	reftable/record: convert old and new object IDs to arrays In `7af607c58d` (reftable/record: store "val1" hashes as static arrays, 2024-01-03) and `b31e3cc620` (reftable/record: store "val2" hashes as static arrays, 2024-01-03) we have converted ref records to store their object IDs in a static array. Convert log records to do the same so that their old and new object IDs are arrays, too. This change results in two allocations less per log record that we're iterating over. Before: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 8,068,495 allocs, 8,068,373 frees, 401,011,862 bytes allocated After: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 6,068,489 allocs, 6,068,367 frees, 361,011,822 bytes allocated Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-03-05 09:10:06 -08:00
Patrick Steinhardt	eea0d11d6d	refs/reftable: reload correct stack when creating reflog iter When creating a new reflog iterator, we first have to reload the stack that the iterator is being created. This is done so that any concurrent writes to the stack are reflected. But `reflog_iterator_for_stack()` always reloads the main stack, which is wrong. Fix this and reload the correct stack. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-03-05 09:10:06 -08:00
Junio C Hamano	2efe7958d6	Merge branch 'ps/reftable-iteration-perf-part2' into ps/reftable-reflog-iteration-perf * ps/reftable-iteration-perf-part2: refs/reftable: precompute prefix length reftable: allow inlining of a few functions reftable/record: decode keys in place reftable/record: reuse refname when copying reftable/record: reuse refname when decoding reftable/merged: avoid duplicate pqueue emptiness check reftable/merged: circumvent pqueue with single subiter reftable/merged: handle subiter cleanup on close only reftable/merged: remove unnecessary null check for subiters reftable/merged: make subiters own their records reftable/merged: advance subiter on subsequent iteration reftable/merged: make `merged_iter` structure private reftable/pq: use `size_t` to track iterator index	2024-03-05 09:09:46 -08:00
Patrick Steinhardt	43f70eaea0	refs/reftable: precompute prefix length We're recomputing the prefix length on every iteration of the ref iterator. Precompute it for another speedup when iterating over 1 million refs: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 100.3 ms ± 3.7 ms [User: 97.3 ms, System: 2.8 ms] Range (min … max): 97.5 ms … 139.7 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 95.8 ms ± 3.4 ms [User: 92.9 ms, System: 2.8 ms] Range (min … max): 93.0 ms … 121.9 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.05 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-03-04 10:19:58 -08:00
Patrick Steinhardt	b0f6b6b523	refs/reftable: don't fail empty transactions in repo without HEAD Under normal circumstances, it shouldn't ever happen that a repository has no HEAD reference. In fact, git-update-ref(1) would fail any request to delete the HEAD reference, and a newly initialized repository always pre-creates it, too. We have however changed git-clone(1) to partially initialize the refdb just up to the point where remote helpers can find the repository. With that change, we are going to run into a situation where repositories have no refs at all. Now there is a very particular edge case in this situation: when preparing an empty ref transacton, we end up returning whatever value `read_ref_without_reload()` returned to the caller. Under normal conditions this would be fine: "HEAD" should usually exist, and thus the function would return `0`. But if "HEAD" doesn't exist, the function returns a positive value which we end up returning to the caller. Fix this bug by resetting the return code to `0` and add a test. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-27 13:53:39 -08:00
Karthik Nayak	33d15b5435	for-each-ref: add new option to include root refs The git-for-each-ref(1) command doesn't provide a way to print root refs i.e pseudorefs and HEAD with the regular "refs/" prefixed refs. This commit adds a new option "--include-root-refs" to git-for-each-ref(1). When used this would also print pseudorefs and HEAD for the current worktree. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-23 10:36:28 -08:00
Karthik Nayak	d0f00c1ac1	refs: introduce `refs_for_each_include_root_refs()` Introduce a new ref iteration flag `DO_FOR_EACH_INCLUDE_ROOT_REFS`, which will be used to iterate over regular refs plus pseudorefs and HEAD. Refs which fall outside the `refs/` and aren't either pseudorefs or HEAD are more of a grey area. This is because we don't block the users from creating such refs but they are not officially supported. Introduce `refs_for_each_include_root_refs()` which calls `do_for_each_ref()` with this newly introduced flag. In `refs/files-backend.c`, introduce a new function `add_pseudoref_and_head_entries()` to add pseudorefs and HEAD to the `ref_dir`. We then finally call `add_pseudoref_and_head_entries()` whenever the `DO_FOR_EACH_INCLUDE_ROOT_REFS` flag is set. Any new ref backend will also have to implement similar changes on its end. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-23 10:36:27 -08:00
Karthik Nayak	f768296cf1	refs: extract out `loose_fill_ref_dir_regular_file()` Extract out the code for adding a single file to the loose ref dir as `loose_fill_ref_dir_regular_file()` from `loose_fill_ref_dir()` in `refs/files-backend.c`. This allows us to use this function independently in the following commits where we add code to also add pseudorefs to the ref dir. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-23 10:36:27 -08:00
Patrick Steinhardt	59c50a96c5	refs: stop resolving ref corresponding to reflogs The reflog iterator tries to resolve the corresponding ref for every reflog that it is about to yield. Historically, this was done due to multiple reasons: - It ensures that the refname is safe because we end up calling `check_refname_format()`. Also, non-conformant refnames are skipped altogether. - The iterator used to yield the resolved object ID as well as its flags to the callback. This info was never used though, and the corresponding parameters were dropped in the preceding commit. - When a ref is corrupt then the reflog is not emitted at all. We're about to introduce a new `git reflog list` subcommand that will print all reflogs that the refdb knows about. Skipping over reflogs whose refs are corrupted would be quite counterproductive in this case as the user would have no way to learn about reflogs which may still exist in their repository to help and rescue such a corrupted ref. Thus, the only remaining reason for why we'd want to resolve the ref is to verify its refname. Refactor the code to call `check_refname_format()` directly instead of trying to resolve the ref. This is significantly more efficient given that we don't have to hit the object database anymore to list reflogs. And second, it ensures that we end up showing reflogs of broken refs, which will help to make the reflog more useful. Note that this really only impacts the case where the corresponding ref is corrupt. Reflogs for nonexistent refs would have been returned to the caller beforehand already as we did not pass `RESOLVE_REF_READING` to the function, and thus `refs_resolve_ref_unsafe()` would have returned successfully in that case. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-21 09:58:06 -08:00
Patrick Steinhardt	31f898397b	refs: drop unused params from the reflog iterator callback The ref and reflog iterators share much of the same underlying code to iterate over the corresponding entries. This results in some weird code because the reflog iterator also exposes an object ID as well as a flag to the callback function. Neither of these fields do refer to the reflog though -- they refer to the corresponding ref with the same name. This is quite misleading. In practice at least the object ID cannot really be implemented in any other way as a reflog does not have a specific object ID in the first place. This is further stressed by the fact that none of the callbacks except for our test helper make use of these fields. Split up the infrastucture so that ref and reflog iterators use separate callback signatures. This allows us to drop the nonsensical fields from the reflog iterator. Note that internally, the backends still use the same shared infra to iterate over both types. As the backends should never end up being called directly anyway, this is not much of a problem and thus kept as-is for simplicity's sake. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-21 09:58:06 -08:00
Patrick Steinhardt	5e01d83841	refs: always treat iterators as ordered In the preceding commit we have converted the reflog iterator of the "files" backend to be ordered, which was the only remaining ref iterator that wasn't ordered. Refactor the ref iterator infrastructure so that we always assume iterators to be ordered, thus simplifying the code. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-21 09:58:06 -08:00
Patrick Steinhardt	6f22780017	refs/files: sort merged worktree and common reflogs When iterating through reflogs in a worktree we create a merged iterator that merges reflogs from both refdbs. The resulting refs are ordered so that instead we first return all worktree reflogs before we return all common refs. This is the only remaining case where a ref iterator returns entries in a non-lexicographic order. The result would look something like the following (listed with a command we introduce in a subsequent commit): ``` $ git reflog list HEAD refs/worktree/per-worktree refs/heads/main refs/heads/wt ``` So we first print the per-worktree reflogs in lexicographic order, then the common reflogs in lexicographic order. This is confusing and not consistent with how we print per-worktree refs, which are exclusively sorted lexicographically. Sort reflogs lexicographically in the same way as we sort normal refs. As this is already implemented properly by the "reftable" backend via a separate selection function, we simply pull out that logic and reuse it for the "files" backend. As logs are properly sorted now, mark the merged reflog iterator as sorted. Tests will be added in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-21 09:58:06 -08:00
Patrick Steinhardt	e69e8ffef7	refs/files: sort reflogs returned by the reflog iterator We use a directory iterator to return reflogs via the reflog iterator. This iterator returns entries in the same order as readdir(3P) would and will thus yield reflogs with no discernible order. Set the new `DIR_ITERATOR_SORTED` flag that was introduced in the preceding commit so that the order is deterministic. While the effect of this can only been observed in a test tool, a subsequent commit will start to expose this functionality to users via a new `git reflog list` subcommand. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-21 09:58:05 -08:00
Patrick Steinhardt	8a0bebdeae	refs/reftable: fix leak when copying reflog fails When copying a ref with the reftable backend we also copy the corresponding log records. When seeking the first log record that we're about to copy fails though we directly return from `write_copy_table()` without doing any cleanup, leaking several allocated data structures. Fix this by exiting via our common cleanup logic instead. Reported-by: Jeff King <peff@peff.net> via Coverity Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-07 21:30:43 -08:00
Patrick Steinhardt	57db2a094d	refs: introduce reftable backend Due to scalability issues, Shawn Pearce has originally proposed a new "reftable" format more than six years ago [1]. Initially, this new format was implemented in JGit with promising results. Around two years ago, we have then added the "reftable" library to the Git codebase via `a4bbd13be3` (Merge branch 'hn/reftable', 2021-12-15). With this we have landed all the low-level code to read and write reftables. Notably missing though was the integration of this low-level code into the Git code base in the form of a new ref backend that ties all of this together. This gap is now finally closed by introducing a new "reftable" backend into the Git codebase. This new backend promises to bring some notable improvements to Git repositories: - It becomes possible to do truly atomic writes where either all refs are committed to disk or none are. This was not possible with the "files" backend because ref updates were split across multiple loose files. - The disk space required to store many refs is reduced, both compared to loose refs and packed-refs. This is enabled both by the reftable format being a binary format, which is more compact, and by prefix compression. - We can ignore filesystem-specific behaviour as ref names are not encoded via paths anymore. This means there is no need to handle case sensitivity on Windows systems or Unicode precomposition on macOS. - There is no need to rewrite the complete refdb anymore every time a ref is being deleted like it was the case for packed-refs. This means that ref deletions are now constant time instead of scaling linearly with the number of refs. - We can ignore file/directory conflicts so that it becomes possible to store both "refs/heads/foo" and "refs/heads/foo/bar". - Due to this property we can retain reflogs for deleted refs. We have previously been deleting reflogs together with their refs to avoid file/directory conflicts, which is not necessary anymore. - We can properly enumerate all refs. With the "files" backend it is not easily possible to distinguish between refs and non-refs because they may live side by side in the gitdir. Not all of these improvements are realized with the current "reftable" backend implementation. At this point, the new backend is supposed to be a drop-in replacement for the "files" backend that is used by basically all Git repositories nowadays. It strives for 1:1 compatibility, which means that a user can expect the same behaviour regardless of whether they use the "reftable" backend or the "files" backend for most of the part. Most notably, this means we artificially limit the capabilities of the "reftable" backend to match the limits of the "files" backend. It is not possible to create refs that would end up with file/directory conflicts, we do not retain reflogs, we perform stricter-than-necessary checks. This is done intentionally due to two main reasons: - It makes it significantly easier to land the "reftable" backend as tests behave the same. It would be tough to argue for each and every single test that doesn't pass with the "reftable" backend. - It ensures compatibility between repositories that use the "files" backend and repositories that use the "reftable" backend. Like this, hosters can migrate their repositories to use the "reftable" backend without causing issues for clients that use the "files" backend in their clones. It is expected that these artificial limitations may eventually go away in the long term. Performance-wise things very much depend on the actual workload. The following benchmarks compare the "files" and "reftable" backends in the current version: - Creating N refs in separate transactions shows that the "files" backend is ~50% faster. This is not surprising given that creating a ref only requires us to create a single loose ref. The "reftable" backend will also perform auto compaction on updates. In real-world workloads we would likely also want to perform pack loose refs, which would likely change the picture. Benchmark 1: update-ref: create refs sequentially (refformat = files, refcount = 1) Time (mean ± σ): 2.1 ms ± 0.3 ms [User: 0.6 ms, System: 1.7 ms] Range (min … max): 1.8 ms … 4.3 ms 133 runs Benchmark 2: update-ref: create refs sequentially (refformat = reftable, refcount = 1) Time (mean ± σ): 2.7 ms ± 0.1 ms [User: 0.6 ms, System: 2.2 ms] Range (min … max): 2.4 ms … 2.9 ms 132 runs Benchmark 3: update-ref: create refs sequentially (refformat = files, refcount = 1000) Time (mean ± σ): 1.975 s ± 0.006 s [User: 0.437 s, System: 1.535 s] Range (min … max): 1.969 s … 1.980 s 3 runs Benchmark 4: update-ref: create refs sequentially (refformat = reftable, refcount = 1000) Time (mean ± σ): 2.611 s ± 0.013 s [User: 0.782 s, System: 1.825 s] Range (min … max): 2.597 s … 2.622 s 3 runs Benchmark 5: update-ref: create refs sequentially (refformat = files, refcount = 100000) Time (mean ± σ): 198.442 s ± 0.241 s [User: 43.051 s, System: 155.250 s] Range (min … max): 198.189 s … 198.670 s 3 runs Benchmark 6: update-ref: create refs sequentially (refformat = reftable, refcount = 100000) Time (mean ± σ): 294.509 s ± 4.269 s [User: 104.046 s, System: 190.326 s] Range (min … max): 290.223 s … 298.761 s 3 runs - Creating N refs in a single transaction shows that the "files" backend is significantly slower once we start to write many refs. The "reftable" backend only needs to update two files, whereas the "files" backend needs to write one file per ref. Benchmark 1: update-ref: create many refs (refformat = files, refcount = 1) Time (mean ± σ): 1.9 ms ± 0.1 ms [User: 0.4 ms, System: 1.4 ms] Range (min … max): 1.8 ms … 2.6 ms 151 runs Benchmark 2: update-ref: create many refs (refformat = reftable, refcount = 1) Time (mean ± σ): 2.5 ms ± 0.1 ms [User: 0.7 ms, System: 1.7 ms] Range (min … max): 2.4 ms … 3.4 ms 148 runs Benchmark 3: update-ref: create many refs (refformat = files, refcount = 1000) Time (mean ± σ): 152.5 ms ± 5.2 ms [User: 19.1 ms, System: 133.1 ms] Range (min … max): 148.5 ms … 167.8 ms 15 runs Benchmark 4: update-ref: create many refs (refformat = reftable, refcount = 1000) Time (mean ± σ): 58.0 ms ± 2.5 ms [User: 28.4 ms, System: 29.4 ms] Range (min … max): 56.3 ms … 72.9 ms 40 runs Benchmark 5: update-ref: create many refs (refformat = files, refcount = 1000000) Time (mean ± σ): 152.752 s ± 0.710 s [User: 20.315 s, System: 131.310 s] Range (min … max): 152.165 s … 153.542 s 3 runs Benchmark 6: update-ref: create many refs (refformat = reftable, refcount = 1000000) Time (mean ± σ): 51.912 s ± 0.127 s [User: 26.483 s, System: 25.424 s] Range (min … max): 51.769 s … 52.012 s 3 runs - Deleting a ref in a fully-packed repository shows that the "files" backend scales with the number of refs. The "reftable" backend has constant-time deletions. Benchmark 1: update-ref: delete ref (refformat = files, refcount = 1) Time (mean ± σ): 1.7 ms ± 0.1 ms [User: 0.4 ms, System: 1.2 ms] Range (min … max): 1.6 ms … 2.1 ms 316 runs Benchmark 2: update-ref: delete ref (refformat = reftable, refcount = 1) Time (mean ± σ): 1.8 ms ± 0.1 ms [User: 0.4 ms, System: 1.3 ms] Range (min … max): 1.7 ms … 2.1 ms 294 runs Benchmark 3: update-ref: delete ref (refformat = files, refcount = 1000) Time (mean ± σ): 2.0 ms ± 0.1 ms [User: 0.5 ms, System: 1.4 ms] Range (min … max): 1.9 ms … 2.5 ms 287 runs Benchmark 4: update-ref: delete ref (refformat = reftable, refcount = 1000) Time (mean ± σ): 1.9 ms ± 0.1 ms [User: 0.5 ms, System: 1.3 ms] Range (min … max): 1.8 ms … 2.1 ms 217 runs Benchmark 5: update-ref: delete ref (refformat = files, refcount = 1000000) Time (mean ± σ): 229.8 ms ± 7.9 ms [User: 182.6 ms, System: 46.8 ms] Range (min … max): 224.6 ms … 245.2 ms 6 runs Benchmark 6: update-ref: delete ref (refformat = reftable, refcount = 1000000) Time (mean ± σ): 2.0 ms ± 0.0 ms [User: 0.6 ms, System: 1.3 ms] Range (min … max): 2.0 ms … 2.1 ms 3 runs - Listing all refs shows no significant advantage for either of the backends. The "files" backend is a bit faster, but not by a significant margin. When repositories are not packed the "reftable" backend outperforms the "files" backend because the "reftable" backend performs auto-compaction. Benchmark 1: show-ref: print all refs (refformat = files, refcount = 1, packed = true) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.0 ms 1729 runs Benchmark 2: show-ref: print all refs (refformat = reftable, refcount = 1, packed = true) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 1.8 ms 1816 runs Benchmark 3: show-ref: print all refs (refformat = files, refcount = 1000, packed = true) Time (mean ± σ): 4.3 ms ± 0.1 ms [User: 0.9 ms, System: 3.3 ms] Range (min … max): 4.1 ms … 4.6 ms 645 runs Benchmark 4: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = true) Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 1.0 ms, System: 3.3 ms] Range (min … max): 4.2 ms … 5.9 ms 643 runs Benchmark 5: show-ref: print all refs (refformat = files, refcount = 1000000, packed = true) Time (mean ± σ): 2.537 s ± 0.034 s [User: 0.488 s, System: 2.048 s] Range (min … max): 2.511 s … 2.627 s 10 runs Benchmark 6: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = true) Time (mean ± σ): 2.712 s ± 0.017 s [User: 0.653 s, System: 2.059 s] Range (min … max): 2.692 s … 2.752 s 10 runs Benchmark 7: show-ref: print all refs (refformat = files, refcount = 1, packed = false) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 1.9 ms 1834 runs Benchmark 8: show-ref: print all refs (refformat = reftable, refcount = 1, packed = false) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.4 ms … 2.0 ms 1840 runs Benchmark 9: show-ref: print all refs (refformat = files, refcount = 1000, packed = false) Time (mean ± σ): 13.8 ms ± 0.2 ms [User: 2.8 ms, System: 10.8 ms] Range (min … max): 13.3 ms … 14.5 ms 208 runs Benchmark 10: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = false) Time (mean ± σ): 4.5 ms ± 0.2 ms [User: 1.2 ms, System: 3.3 ms] Range (min … max): 4.3 ms … 6.2 ms 624 runs Benchmark 11: show-ref: print all refs (refformat = files, refcount = 1000000, packed = false) Time (mean ± σ): 12.127 s ± 0.129 s [User: 2.675 s, System: 9.451 s] Range (min … max): 11.965 s … 12.370 s 10 runs Benchmark 12: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = false) Time (mean ± σ): 2.799 s ± 0.022 s [User: 0.735 s, System: 2.063 s] Range (min … max): 2.769 s … 2.836 s 10 runs - Printing a single ref shows no real difference between the "files" and "reftable" backends. Benchmark 1: show-ref: print single ref (refformat = files, refcount = 1) Time (mean ± σ): 1.5 ms ± 0.1 ms [User: 0.4 ms, System: 1.0 ms] Range (min … max): 1.4 ms … 1.8 ms 1779 runs Benchmark 2: show-ref: print single ref (refformat = reftable, refcount = 1) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.4 ms … 2.5 ms 1753 runs Benchmark 3: show-ref: print single ref (refformat = files, refcount = 1000) Time (mean ± σ): 1.5 ms ± 0.1 ms [User: 0.3 ms, System: 1.1 ms] Range (min … max): 1.4 ms … 1.9 ms 1840 runs Benchmark 4: show-ref: print single ref (refformat = reftable, refcount = 1000) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.0 ms 1831 runs Benchmark 5: show-ref: print single ref (refformat = files, refcount = 1000000) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.1 ms 1848 runs Benchmark 6: show-ref: print single ref (refformat = reftable, refcount = 1000000) Time (mean ± σ): 1.6 ms ± 0.1 ms [User: 0.4 ms, System: 1.1 ms] Range (min … max): 1.5 ms … 2.1 ms 1762 runs So overall, performance depends on the usecases. Except for many sequential writes the "reftable" backend is roughly on par or significantly faster than the "files" backend though. Given that the "files" backend has received 18 years of optimizations by now this can be seen as a win. Furthermore, we can expect that the "reftable" backend will grow faster over time when attention turns more towards optimizations. The complete test suite passes, except for those tests explicitly marked to require the REFFILES prerequisite. Some tests in t0610 are marked as failing because they depend on still-in-flight bug fixes. Tests can be run with the new backend by setting the GIT_TEST_DEFAULT_REF_FORMAT environment variable to "reftable". There is a single known conceptual incompatibility with the dumb HTTP transport. As "info/refs" SHOULD NOT contain the HEAD reference, and because the "HEAD" file is not valid anymore, it is impossible for the remote client to figure out the default branch without changing the protocol. This shortcoming needs to be handled in a subsequent patch series. As the reftable library has already been introduced a while ago, this commit message will not go into the details of how exactly the on-disk format works. Please refer to our preexisting technical documentation at Documentation/technical/reftable for this. [1]: https://public-inbox.org/git/CAJo=hJtyof=HRy=2sLP0ng0uZ4=S-DpZ5dR1aF+VHVETKG20OQ@mail.gmail.com/ Original-idea-by: Shawn Pearce <spearce@spearce.org> Based-on-patch-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-02-07 08:28:37 -08:00
Junio C Hamano	dc8ce995a2	Merge branch 'ps/worktree-refdb-initialization' Instead of manually creating refs/ hierarchy on disk upon a creation of a secondary worktree, which is only usable via the files backend, use the refs API to populate it. * ps/worktree-refdb-initialization: builtin/worktree: create refdb via ref backend worktree: expose interface to look up worktree by name builtin/worktree: move setup of commondir file earlier refs/files: skip creation of "refs/{heads,tags}" for worktrees setup: move creation of "refs/" into the files backend refs: prepare `refs_init_db()` for initializing worktree refs	2024-01-26 08:54:46 -08:00
Junio C Hamano	32c6fc3e30	Merge branch 'ps/refstorage-extension' Introduce a new extension "refstorage" so that we can mark a repository that uses a non-default ref backend, like reftable. * ps/refstorage-extension: t9500: write "extensions.refstorage" into config builtin/clone: introduce `--ref-format=` value flag builtin/init: introduce `--ref-format=` value flag builtin/rev-parse: introduce `--show-ref-format` flag t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar setup: introduce GIT_DEFAULT_REF_FORMAT envvar setup: introduce "extensions.refStorage" extension setup: set repository's formats on init setup: start tracking ref storage format refs: refactor logic to look up storage backends worktree: skip reading HEAD when repairing worktrees t: introduce DEFAULT_REPO_FORMAT prereq	2024-01-16 10:11:57 -08:00
Junio C Hamano	492ee03f60	Merge branch 'en/header-cleanup' Remove unused header "#include". * en/header-cleanup: treewide: remove unnecessary includes in source files treewide: add direct includes currently only pulled in transitively trace2/tr2_tls.h: remove unnecessary include submodule-config.h: remove unnecessary include pkt-line.h: remove unnecessary include line-log.h: remove unnecessary include http.h: remove unnecessary include fsmonitor--daemon.h: remove unnecessary includes blame.h: remove unnecessary includes archive.h: remove unnecessary include treewide: remove unnecessary includes in source files treewide: remove unnecessary includes from header files	2024-01-08 14:05:15 -08:00
Patrick Steinhardt	2eb1d0c452	refs/files: skip creation of "refs/{heads,tags}" for worktrees The files ref backend will create both "refs/heads" and "refs/tags" in the Git directory. While this logic makes sense for normal repositories, it does not for worktrees because those refs are "common" refs that would always be contained in the main repository's ref database. Introduce a new flag telling the backend that it is expected to create a per-worktree ref database and skip creation of these dirs in the files backend when the flag is set. No other backends (currently) need worktree-specific logic, so this is the only required change to start creating per-worktree ref databases via `refs_init_db()`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-08 13:17:30 -08:00
Patrick Steinhardt	c358d165f2	setup: move creation of "refs/" into the files backend When creating the ref database we unconditionally create the "refs/" directory in "setup.c". This is a mandatory prerequisite for all Git repositories regardless of the ref backend in use, because Git will be unable to detect the directory as a repository if "refs/" doesn't exist. We are about to add another new caller that will want to create a ref database when creating worktrees. We would require the same logic to create the "refs/" directory even though the caller really should not care about such low-level details. Ideally, the ref database should be fully initialized after calling `refs_init_db()`. Move the code to create the directory into the files backend itself to make it so. This means that future ref backends will also need to have equivalent logic around to ensure that the directory exists, but it seems a lot more sensible to have it this way round than to require callers to create the directory themselves. An alternative to this would be to create "refs/" in `refs_init_db()` directly. This feels conceptually unclean though as the creation of the refdb is now cluttered across different callsites. Furthermore, both the "files" and the upcoming "reftable" backend write backend-specific data into the "refs/" directory anyway, so splitting up this logic would only make it harder to reason about. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-08 13:17:30 -08:00
Patrick Steinhardt	2e573d61ff	refs: prepare `refs_init_db()` for initializing worktree refs The purpose of `refs_init_db()` is to initialize the on-disk files of a new ref database. The function is quite inflexible right now though, as callers can neither specify the `struct ref_store` nor can they pass any flags. Refactor the interface to accept both of these. This will be required so that we can start initializing per-worktree ref databases via the ref backend instead of open-coding the initialization in "worktree.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2024-01-08 13:17:30 -08:00

1 2 3 4 5 ...

600 commits