Commit graph

638 commits

Author SHA1 Message Date
Junio C Hamano 7b472da915 Merge branch 'ps/use-the-repository'
A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help
transition the codebase to rely less on the availability of the
singleton the_repository instance.

* ps/use-the-repository:
  hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE`
  t/helper: remove dependency on `the_repository` in "proc-receive"
  t/helper: fix segfault in "oid-array" command without repository
  t/helper: use correct object hash in partial-clone helper
  compat/fsmonitor: fix socket path in networked SHA256 repos
  replace-object: use hash algorithm from passed-in repository
  protocol-caps: use hash algorithm from passed-in repository
  oidset: pass hash algorithm when parsing file
  http-fetch: don't crash when parsing packfile without a repo
  hash-ll: merge with "hash.h"
  refs: avoid include cycle with "repository.h"
  global: introduce `USE_THE_REPOSITORY_VARIABLE` macro
  hash: require hash algorithm in `empty_tree_oid_hex()`
  hash: require hash algorithm in `is_empty_{blob,tree}_oid()`
  hash: make `is_null_oid()` independent of `the_repository`
  hash: convert `oidcmp()` and `oideq()` to compare whole hash
  global: ensure that object IDs are always padded
  hash: require hash algorithm in `oidread()` and `oidclr()`
  hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()`
  hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions
2024-07-02 09:59:00 -07:00
Junio C Hamano 5f14d20984 Merge branch 'kn/update-ref-symref'
"git update-ref --stdin" learned to handle transactional updates of
symbolic-refs.

* kn/update-ref-symref:
  update-ref: add support for 'symref-update' command
  reftable: pick either 'oid' or 'target' for new updates
  update-ref: add support for 'symref-create' command
  update-ref: add support for 'symref-delete' command
  update-ref: add support for 'symref-verify' command
  refs: specify error for regular refs with `old_target`
  refs: create and use `ref_update_expects_existing_old_ref()`
2024-06-20 15:45:12 -07:00
Junio C Hamano 4216329457 Merge branch 'ps/no-writable-strings'
Building with "-Werror -Wwrite-strings" is now supported.

* ps/no-writable-strings: (27 commits)
  config.mak.dev: enable `-Wwrite-strings` warning
  builtin/merge: always store allocated strings in `pull_twohead`
  builtin/rebase: always store allocated string in `options.strategy`
  builtin/rebase: do not assign default backend to non-constant field
  imap-send: fix leaking memory in `imap_server_conf`
  imap-send: drop global `imap_server_conf` variable
  mailmap: always store allocated strings in mailmap blob
  revision: always store allocated strings in output encoding
  remote-curl: avoid assigning string constant to non-const variable
  send-pack: always allocate receive status
  parse-options: cast long name for OPTION_ALIAS
  http: do not assign string constant to non-const field
  compat/win32: fix const-correctness with string constants
  pretty: add casts for decoration option pointers
  object-file: make `buf` parameter of `index_mem()` a constant
  object-file: mark cached object buffers as const
  ident: add casts for fallback name and GECOS
  entry: refactor how we remove items for delayed checkouts
  line-log: always allocate the output prefix
  line-log: stop assigning string constant to file parent buffer
  ...
2024-06-17 15:55:58 -07:00
Junio C Hamano 40a163f217 Merge branch 'ps/ref-storage-migration'
A new command has been added to migrate a repository that uses the
files backend for its ref storage to use the reftable backend, with
limitations.

* ps/ref-storage-migration:
  builtin/refs: new command to migrate ref storage formats
  refs: implement logic to migrate between ref storage formats
  refs: implement removal of ref storages
  worktree: don't store main worktree twice
  reftable: inline `merged_table_release()`
  refs/files: fix NULL pointer deref when releasing ref store
  refs/files: extract function to iterate through root refs
  refs/files: refactor `add_pseudoref_and_head_entries()`
  refs: allow to skip creation of reflog entries
  refs: pass storage format to `ref_store_init()` explicitly
  refs: convert ref storage format to an enum
  setup: unset ref storage when reinitializing repository version
2024-06-17 15:55:55 -07:00
Patrick Steinhardt 8a676bdc5c hash-ll: merge with "hash.h"
The "hash-ll.h" header was introduced via d1cbe1e6d8 (hash-ll.h: split
out of hash.h to remove dependency on repository.h, 2023-04-22) to make
explicit the split between hash-related functions that rely on the
global `the_repository`, and those that don't. This split is no longer
necessary now that we we have removed the reliance on `the_repository`.

Merge "hash-ll.h" back into "hash.h". This causes some code units to not
include "repository.h" anymore, which requires us to add some forward
declarations.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:33 -07:00
Patrick Steinhardt e7da938570 global: introduce USE_THE_REPOSITORY_VARIABLE macro
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.

It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.

Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit

For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).

Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:33 -07:00
Patrick Steinhardt 9da95bda74 hash: require hash algorithm in oidread() and oidclr()
Both `oidread()` and `oidclr()` use `the_repository` to derive the hash
function that shall be used. Require callers to pass in the hash
algorithm to get rid of this implicit dependency.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:32 -07:00
Junio C Hamano 092b33da2b Merge branch 'ps/ref-storage-migration' into ps/use-the-repository
* ps/ref-storage-migration:
  builtin/refs: new command to migrate ref storage formats
  refs: implement logic to migrate between ref storage formats
  refs: implement removal of ref storages
  worktree: don't store main worktree twice
  reftable: inline `merged_table_release()`
  refs/files: fix NULL pointer deref when releasing ref store
  refs/files: extract function to iterate through root refs
  refs/files: refactor `add_pseudoref_and_head_entries()`
  refs: allow to skip creation of reflog entries
  refs: pass storage format to `ref_store_init()` explicitly
  refs: convert ref storage format to an enum
  setup: unset ref storage when reinitializing repository version
2024-06-13 09:39:08 -07:00
Patrick Steinhardt 23c32511b3 refs/reftable: stop micro-optimizing refname allocations on copy
When copying refs, we execute `write_copy_table()` to write the new
table. As the names are given to us via `arg->newname` and
`arg->oldname`, respectively, we optimize away some allocations by
assigning those fields to the reftable records we are about to write
directly, without duplicating them. This requires us to cast the input
to `char *` pointers as they are in fact constant strings. Later on, we
then unset the refname for all of the records before calling
`reftable_log_record_release()` on them.

We also do this when assigning the "HEAD" constant, but here we do not
cast because its type is `char[]` by default. It's about to be turned
into `const char *` though once we enable `-Wwrite-strings` and will
thus cause another warning.

It's quite dubious whether this micro-optimization really helps. We're
about to write to disk anyway, which is going to be way slower than a
small handful of allocations. Let's drop the optimization altogther and
instead copy arguments to simplify the code and avoid the future warning
with `-Wwrite-strings`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07 10:30:48 -07:00
Karthik Nayak f1dcdd6deb reftable: pick either 'oid' or 'target' for new updates
When creating a reference transaction update, we can provide the old/new
oid/target for the update. We have checks in place to ensure that for
each old/new, either oid or target is set and not both.

In the reftable backend, when dealing with updates without the
`REF_NO_DEREF` flag, we don't selectively propagate data as needed.
Since there are no active users of the path, this is not caught. As we
want to introduce the 'symref-update' command in the upcoming commit,
which would use this flow, correct it.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07 10:25:45 -07:00
Karthik Nayak aa6e99f122 refs: specify error for regular refs with old_target
When a reference update tries to update a symref, but the ref in
question is actually a regular ref, we raise an error. However the error
raised in this situation is:

  verifying symref target: '<ref>': reference is missing but expected <old-target>

which is very generic and doesn't indicate the mismatch of types. Let's
make this error more specific:

  cannot lock ref '<ref>': expected symref with target '<old-target>': but is a regular ref

so that users have a clearer understanding.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07 10:25:44 -07:00
Karthik Nayak aba381c090 refs: create and use ref_update_expects_existing_old_ref()
The files and reftable backend, need to check if a ref must exist, so
that the required validation can be done. A ref must exist only when the
`old_oid` value of the update has been explicitly set and it is not the
`null_oid` value.

Since we also support symrefs now, we need to ensure that even when
`old_target` is set a ref must exist. While this was missed when we
added symref support in transactions, there are no active users of this
path. As we introduce the 'symref-verify' command in the upcoming
commits, it is important to fix this.

So let's export this to a function called
`ref_update_expects_existing_old_ref()` and expose it internally via
'refs-internal.h'.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07 10:25:44 -07:00
Patrick Steinhardt 64a6dd8ffc refs: implement removal of ref storages
We're about to introduce logic to migrate ref storages. One part of the
migration will be to delete the files that are part of the old ref
storage format. We don't yet have a way to delete such data generically
across ref backends though.

Implement a new `delete` callback and expose it via a new
`ref_storage_delete()` function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06 09:04:33 -07:00
Patrick Steinhardt b3e098d6e7 refs/files: fix NULL pointer deref when releasing ref store
The `free_ref_cache()` function is not `NULL` safe and will thus
segfault when being passed such a pointer. This can easily happen when
trying to release a partially initialized "files" ref store. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06 09:04:32 -07:00
Patrick Steinhardt 120b67172f refs/files: extract function to iterate through root refs
Extract a new function that can be used to iterate through all root refs
known to the "files" backend. This will be used in the next commit,
where we start to teach ref backends to remove themselves.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06 09:04:32 -07:00
Patrick Steinhardt 66275a6311 refs/files: refactor add_pseudoref_and_head_entries()
The `add_pseudoref_and_head_entries()` function accepts both the ref
store as well as a directory name as input. This is unnecessary though
as the ref store already uniquely identifies the root directory of the
ref store anyway.

Furthermore, the function is misnamed now that we have clarified the
meaning of pseudorefs as it doesn't add pseudorefs, but root refs.
Rename it accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06 09:04:32 -07:00
Patrick Steinhardt fbd1a693c7 refs: allow to skip creation of reflog entries
The ref backends do not have any way to disable the creation of reflog
entries. This will be required for upcoming ref format migration logic
so that we do not create any entries that didn't exist in the original
ref database.

Provide a new `REF_SKIP_CREATE_REFLOG` flag that allows the caller to
disable reflog entry creation.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06 09:04:31 -07:00
Junio C Hamano 988499e295 Merge branch 'ps/refs-without-the-repository-updates'
Further clean-up the refs subsystem to stop relying on
the_repository, and instead use the repository associated to the
ref_store object.

* ps/refs-without-the-repository-updates:
  refs/packed: remove references to `the_hash_algo`
  refs/files: remove references to `the_hash_algo`
  refs/files: use correct repository
  refs: remove `dwim_log()`
  refs: drop `git_default_branch_name()`
  refs: pass repo when peeling objects
  refs: move object peeling into "object.c"
  refs: pass ref store when detecting dangling symrefs
  refs: convert iteration over replace refs to accept ref store
  refs: retrieve worktree ref stores via associated repository
  refs: refactor `resolve_gitlink_ref()` to accept a repository
  refs: pass repo when retrieving submodule ref store
  refs: track ref stores via strmap
  refs: implement releasing ref storages
  refs: rename `init_db` callback to avoid confusion
  refs: adjust names for `init` and `init_db` callbacks
2024-05-30 14:15:13 -07:00
Junio C Hamano 67ce50ba26 Merge branch 'ps/reftable-reusable-iterator'
Code clean-up to make the reftable iterator closer to be reusable.

* ps/reftable-reusable-iterator:
  reftable/merged: adapt interface to allow reuse of iterators
  reftable/stack: provide convenience functions to create iterators
  reftable/reader: adapt interface to allow reuse of iterators
  reftable/generic: adapt interface to allow reuse of iterators
  reftable/generic: move seeking of records into the iterator
  reftable/merged: simplify indices for subiterators
  reftable/merged: split up initialization and seeking of records
  reftable/reader: set up the reader when initializing table iterator
  reftable/reader: inline `reader_seek_internal()`
  reftable/reader: separate concerns of table iter and reftable reader
  reftable/reader: unify indexed and linear seeking
  reftable/reader: avoid copying index iterator
  reftable/block: use `size_t` to track restart point index
2024-05-30 14:15:12 -07:00
Junio C Hamano 23528d352a Merge branch 'ps/reftable-write-options'
The knobs to tweak how reftable files are written have been made
available as configuration variables.

* ps/reftable-write-options:
  refs/reftable: allow configuring geometric factor
  reftable: make the compaction factor configurable
  refs/reftable: allow disabling writing the object index
  refs/reftable: allow configuring restart interval
  reftable: use `uint16_t` to track restart interval
  refs/reftable: allow configuring block size
  reftable/dump: support dumping a table's block structure
  reftable/writer: improve error when passed an invalid block size
  reftable/writer: drop static variable used to initialize strbuf
  reftable: pass opts as constant pointer
  reftable: consistently refer to `reftable_write_options` as `opts`
2024-05-30 14:15:11 -07:00
Junio C Hamano 16a592f132 Merge branch 'ps/pseudo-ref-terminology'
Terminology to call various ref-like things are getting
straightened out.

* ps/pseudo-ref-terminology:
  refs: refuse to write pseudorefs
  ref-filter: properly distinuish pseudo and root refs
  refs: pseudorefs are no refs
  refs: classify HEAD as a root ref
  refs: do not check ref existence in `is_root_ref()`
  refs: rename `is_special_ref()` to `is_pseudo_ref()`
  refs: rename `is_pseudoref()` to `is_root_ref()`
  Documentation/glossary: define root refs as refs
  Documentation/glossary: clarify limitations of pseudorefs
  Documentation/glossary: redefine pseudorefs as special refs
2024-05-28 11:17:06 -07:00
Junio C Hamano 86a49253a6 Merge branch 'it/refs-name-conflict'
Expose "name conflict" error when a ref creation fails due to D/F
conflict in the ref namespace, to improve an error message given by
"git fetch".

* it/refs-name-conflict:
  refs: return conflict error when checking packed refs
2024-05-23 11:04:27 -07:00
Junio C Hamano 939d49e9bd Merge branch 'kn/ref-transaction-symref' into kn/update-ref-symref
* kn/ref-transaction-symref:
  refs: remove `create_symref` and associated dead code
  refs: rename `refs_create_symref()` to `refs_update_symref()`
  refs: use transaction in `refs_create_symref()`
  refs: add support for transactional symref updates
  refs: move `original_update_refname` to 'refs.c'
  refs: support symrefs in 'reference-transaction' hook
  files-backend: extract out `create_symref_lock()`
  refs: accept symref values in `ref_transaction_update()`
2024-05-23 09:38:59 -07:00
Junio C Hamano 0ff6d23a0f Merge branch 'ps/pseudo-ref-terminology' into ps/ref-storage-migration
* ps/pseudo-ref-terminology:
  refs: refuse to write pseudorefs
  ref-filter: properly distinuish pseudo and root refs
  refs: pseudorefs are no refs
  refs: classify HEAD as a root ref
  refs: do not check ref existence in `is_root_ref()`
  refs: rename `is_special_ref()` to `is_pseudo_ref()`
  refs: rename `is_pseudoref()` to `is_root_ref()`
  Documentation/glossary: define root refs as refs
  Documentation/glossary: clarify limitations of pseudorefs
  Documentation/glossary: redefine pseudorefs as special refs
2024-05-23 09:14:32 -07:00
Junio C Hamano e55f364398 Merge branch 'ps/refs-without-the-repository-updates' into ps/ref-storage-migration
* ps/refs-without-the-repository-updates:
  refs/packed: remove references to `the_hash_algo`
  refs/files: remove references to `the_hash_algo`
  refs/files: use correct repository
  refs: remove `dwim_log()`
  refs: drop `git_default_branch_name()`
  refs: pass repo when peeling objects
  refs: move object peeling into "object.c"
  refs: pass ref store when detecting dangling symrefs
  refs: convert iteration over replace refs to accept ref store
  refs: retrieve worktree ref stores via associated repository
  refs: refactor `resolve_gitlink_ref()` to accept a repository
  refs: pass repo when retrieving submodule ref store
  refs: track ref stores via strmap
  refs: implement releasing ref storages
  refs: rename `init_db` callback to avoid confusion
  refs: adjust names for `init` and `init_db` callbacks
2024-05-23 09:14:08 -07:00
Junio C Hamano 4beb7a3b06 Merge branch 'kn/ref-transaction-symref'
Updates to symbolic refs can now be made as a part of ref
transaction.

* kn/ref-transaction-symref:
  refs: remove `create_symref` and associated dead code
  refs: rename `refs_create_symref()` to `refs_update_symref()`
  refs: use transaction in `refs_create_symref()`
  refs: add support for transactional symref updates
  refs: move `original_update_refname` to 'refs.c'
  refs: support symrefs in 'reference-transaction' hook
  files-backend: extract out `create_symref_lock()`
  refs: accept symref values in `ref_transaction_update()`
2024-05-20 11:20:04 -07:00
Patrick Steinhardt 00892786b8 refs/packed: remove references to the_hash_algo
Remove references to `the_hash_algo` in favor of the hash algo specified
by the repository associated with the packed ref store.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17 10:33:40 -07:00
Patrick Steinhardt c1026b9d7d refs/files: remove references to the_hash_algo
Remove references to `the_hash_algo` in favor of the hash algo specified
by the repository associated with the files ref store.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17 10:33:40 -07:00
Patrick Steinhardt c9e9723e1f refs/files: use correct repository
There are several places in the "files" backend where we use
`the_repository` instead of the repository associated with the ref store
itself. Adapt those to use the correct repository.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17 10:33:40 -07:00
Patrick Steinhardt 30aaff437f refs: pass repo when peeling objects
Both `peel_object()` and `peel_iterated_oid()` implicitly rely on
`the_repository` to look up objects. Despite the fact that we want to
get rid of `the_repository`, it also leads to some restrictions in our
ref iterators when trying to retrieve the peeled value for a repository
other than `the_repository`.

Refactor these functions such that both take a repository as argument
and remove the now-unnecessary restrictions.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17 10:33:39 -07:00
Patrick Steinhardt 19c76e8235 refs: move object peeling into "object.c"
Peeling an object has nothing to do with refs, but we still have the
code in "refs.c". Move it over into "object.c", which is a more natural
place to put it.

Ideally, we'd also move `peel_iterated_oid()` over into "object.c". But
this function is tied to the refs interfaces because it uses a global
ref iterator variable to optimize peeling when the iterator already has
the peeled object ID readily available.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17 10:33:39 -07:00
Patrick Steinhardt 8378c9d27b refs: convert iteration over replace refs to accept ref store
The function `for_each_replace_ref()` is a bit of an oddball across the
refs interfaces as it accepts a pointer to the repository instead of a
pointer to the ref store. The only reason for us to accept a repository
is so that we can eventually pass it back to the callback function that
the caller has provided. This is somewhat arbitrary though, as callers
that need the repository can instead make it accessible via the callback
payload.

Refactor the function to instead accept the ref store and adjust callers
accordingly. This allows us to get rid of some of the boilerplate that
we had to carry to pass along the repository and brings us in line with
the other functions that iterate through refs.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17 10:33:38 -07:00
Patrick Steinhardt 965f8991e5 refs: pass repo when retrieving submodule ref store
Looking up submodule ref stores has two deficiencies:

  - The initialized subrepo will be attributed to `the_repository`.

  - The submodule ref store will be tracked in a global map.

This makes it impossible to have submodule ref stores for a repository
other than `the_repository`.

Modify the function to accept the parent repository as parameter and
move the global map into `struct repository`. Like this it becomes
possible to look up submodule ref stores for arbitrary repositories.

Note that this also adds a new reference to `the_repository` in
`resolve_gitlink_ref()`, which is part of the refs interfaces. This will
get adjusted in the next patch.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17 10:33:37 -07:00
Patrick Steinhardt 71c871b48d refs: implement releasing ref storages
Ref storages are typically only initialized once for `the_repository`
and then never released. Until now we got away with that without causing
memory leaks because `the_repository` stays reachable, and because the
ref backend is reachable via `the_repository` its memory basically never
leaks.

This is about to change though because of the upcoming migration logic,
which will create a secondary ref storage. In that case, we will either
have to release the old or new ref storage to avoid leaks.

Implement a new `release` callback and expose it via a new
`ref_storage_release()` function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17 10:33:37 -07:00
Patrick Steinhardt ed93ea1602 refs: rename init_db callback to avoid confusion
Reference backends have two callbacks `init` and `init_db`. The
similarity of these two callbacks has repeatedly confused me whenever I
was looking at them, where I always had to look up which of them does
what.

Rename the `init_db` callback to `create_on_disk`, which should
hopefully be clearer.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17 10:33:36 -07:00
Patrick Steinhardt 1febabff7a refs: adjust names for init and init_db callbacks
The names of the functions that implement the `init` and `init_db`
callbacks in the "files" and "packed" backends do not match the names of
the callbacks, which is inconsistent. Rename them so that they match,
which makes it easier to discover their respective implementations.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17 10:33:36 -07:00
Patrick Steinhardt 31951c2248 refs: classify HEAD as a root ref
Root refs are those refs that live in the root of the ref hierarchy.
Our old and venerable "HEAD" reference falls into this category, but we
don't yet classify it as such in `is_root_ref()`.

Adapt the function to also treat "HEAD" as a root ref. This change is
safe to do for all current callers:

  - `ref_kind_from_refname()` already handles "HEAD" explicitly before
    calling `is_root_ref()`.

  - The "files" and "reftable" backends explicitly call both
    `is_root_ref()` and `is_headref()` together.

This also aligns behaviour or `is_root_ref()` and `is_headref()` such
that we stop checking for ref existence. This changes semantics for our
backends:

  - In the reftable backend we already know that the ref must exist
    because `is_headref()` is called as part of the ref iterator. The
    existence check is thus redundant, and the change is safe to do.

  - In the files backend we use it when populating root refs, where we
    would skip adding the "HEAD" file if it was not possible to resolve
    it. The new behaviour is to instead mark "HEAD" as broken, which
    will cause us to emit warnings in various places.

As there are no callers of `is_headref()` left afer the refactoring, we
can absorb it completely into `is_root_ref()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-15 07:30:52 -07:00
Patrick Steinhardt afcd067dad refs: do not check ref existence in is_root_ref()
Before this patch series, root refs except for "HEAD" and our special
refs were classified as pseudorefs. Furthermore, our terminology
clarified that pseudorefs must not be symbolic refs. This restriction
is enforced in `is_root_ref()`, which explicitly checks that a supposed
root ref resolves to an object ID without recursing.

This has been extremely confusing right from the start because (in old
terminology) a ref name may sometimes be a pseudoref and sometimes not
depending on whether it is a symbolic or regular ref. This behaviour
does not seem reasonable at all and I very much doubt that it results in
anything sane.

Last but not least, the current behaviour can actually lead to a
segfault when calling `is_root_ref()` with a reference that either does
not exist or that is a symbolic ref because we never initialized `oid`,
but then read it via `is_null_oid()`.

We have now changed terminology to clarify that pseudorefs are really
only "MERGE_HEAD" and "FETCH_HEAD", whereas all the other refs that live
in the root of the ref hierarchy are just plain refs. Thus, we do not
need to check whether the ref is symbolic or not. In fact, we can now
avoid looking up the ref completely as the name is sufficient for us to
figure out whether something would be a root ref or not.

This change of course changes semantics for our callers. As there are
only three of them we can assess each of them individually:

  - "ref-filter.c:ref_kind_from_refname()" uses it to classify refs.
    It's clear that the intent is to classify based on the ref name,
    only.

  - "refs/reftable_backend.c:reftable_ref_iterator_advance()" uses it to
    filter root refs. Again, using existence checks is pointless here as
    the iterator has just surfaced the ref, so we know it does exist.

  - "refs/files_backend.c:add_pseudoref_and_head_entries()" uses it to
    determine whether it should add a ref to the root directory of its
    iterator. This had the effect that we skipped over any files that
    are either a symbolic ref, or which are not a ref at all.

    The new behaviour is to include symbolic refs know, which aligns us
    with the adapted terminology. Furthermore, files which look like
    root refs but aren't are now mark those as "broken". As broken refs
    are not surfaced by our tooling, this should not lead to a change in
    user-visible behaviour, but may cause us to emit warnings. This
    feels like the right thing to do as we would otherwise just silently
    ignore corrupted root refs completely.

So in all cases the existence check was either superfluous, not in line
with the adapted terminology or masked potential issues. This commit
thus changes the behaviour as proposed and drops the existence check
altogether.

Add a test that verifies that this does not change user-visible
behaviour. Namely, we still don't want to show broken refs to the user
by default in git-for-each-ref(1). What this does allow though is for
internal callers to surface dangling root refs when they pass in the
`DO_FOR_EACH_INCLUDE_BROKEN` flag.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-15 07:30:52 -07:00
Patrick Steinhardt f6936e62a5 refs: rename is_pseudoref() to is_root_ref()
Rename `is_pseudoref()` to `is_root_ref()` to adapt to the newly defined
terminology in our gitglossary(7).

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-15 07:30:51 -07:00
Patrick Steinhardt 08efe69212 reftable/stack: provide convenience functions to create iterators
There exist a bunch of call sites in the reftable backend that want to
create iterators for a reftable stack. This is rather convoluted right
now, where you always have to go via the merged table. And it is about
to become even more convoluted when we split up iterator initialization
and seeking in the next commit.

Introduce convenience functions that allow the caller to create an
iterator from a reftable stack directly without going through the merged
table. Adapt callers accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13 17:04:19 -07:00
Patrick Steinhardt f518d91a2b refs/reftable: allow configuring geometric factor
Allow configuring the geometric factor used by the auto-compaction
algorithm whenever a new table is appended to the stack of tables.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13 17:02:39 -07:00
Patrick Steinhardt afbdbfae0b refs/reftable: allow disabling writing the object index
Besides the expected "ref" and "log" records, the reftable library also
writes "obj" records. These are basically a reverse mapping of object
IDs to their respective ref records so that it becomes efficient to
figure out which references point to a specific object. The motivation
for this data structure is the "uploadpack.allowTipSHA1InWant" config,
which allows a client to fetch any object by its hash that has a ref
pointing to it.

This reverse index is not used by Git at all though, and the expectation
is that most hosters nowadays use "uploadpack.allowAnySHA1InWant". It
may thus be preferable for many users to disable writing these optional
object indices altogether to safe some precious disk space.

Add a new config "reftable.indexObjects" that allows the user to disable
the object index altogether.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13 17:02:38 -07:00
Patrick Steinhardt 90db611c2a refs/reftable: allow configuring restart interval
Add a new option `reftable.restartInterval` that allows the user to
control the restart interval when writing reftable records used by the
reftable library.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13 17:02:38 -07:00
Patrick Steinhardt 831b366c24 refs/reftable: allow configuring block size
Add a new option `reftable.blockSize` that allows the user to control
the block size used by the reftable library.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13 17:02:38 -07:00
Patrick Steinhardt 799237852b reftable: pass opts as constant pointer
We sometimes pass the refatble write options as value and sometimes as a
pointer. This is quite confusing and makes the reader wonder whether the
options get modified sometimes.

In fact, `reftable_new_writer()` does cause the caller-provided options
to get updated when some values aren't set up. This is quite unexpected,
but didn't cause any harm until now.

Adapt the code so that we do not modify the caller-provided values
anymore. While at it, refactor the code to code to consistently pass the
options as a constant pointer to clarify that the caller-provided opts
will not ever get modified.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13 17:02:38 -07:00
Junio C Hamano 5aec7231c8 Merge branch 'ps/reftable-write-optim'
Code to write out reftable has seen some optimization and
simplification.

* ps/reftable-write-optim:
  reftable/block: reuse compressed array
  reftable/block: reuse zstream when writing log blocks
  reftable/writer: reset `last_key` instead of releasing it
  reftable/writer: unify releasing memory
  reftable/writer: refactorings for `writer_flush_nonempty_block()`
  reftable/writer: refactorings for `writer_add_record()`
  refs/reftable: don't recompute committer ident
  reftable: remove name checks
  refs/reftable: skip duplicate name checks
  refs/reftable: perform explicit D/F check when writing symrefs
  refs/reftable: fix D/F conflict error message on ref copy
2024-05-08 10:18:43 -07:00
Karthik Nayak 4865707bda refs: remove create_symref and associated dead code
In the previous commits, we converted `refs_create_symref()` to utilize
transactions to perform symref updates. Earlier `refs_create_symref()`
used `create_symref()` to do the same.

We can now remove `create_symref()` and any code associated with it
which is no longer used. We remove `create_symref()` code from all the
reference backends and also remove it entirely from the `ref_storage_be`
struct.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07 08:51:50 -07:00
Karthik Nayak 644daf7785 refs: add support for transactional symref updates
The reference backends currently support transactional reference
updates. While this is exposed to users via 'git-update-ref' and its
'--stdin' mode, it is also used internally within various commands.

However, we do not support transactional updates of symrefs. This commit
adds support for symrefs in both the 'files' and the 'reftable' backend.

Here, we add and use `ref_update_has_null_new_value()`, a helper
function which is used to check if there is a new_value in a reference
update. The new value could either be a symref target `new_target` or a
OID `new_oid`.

We also add another common function `ref_update_check_old_target` which
will be used to check if the update's old_target corresponds to a
reference's current target.

Now transactional updates (verify, create, delete, update) can be used
for:
- regular refs
- symbolic refs
- conversion of regular to symbolic refs and vice versa

This also allows us to expose this to users via new commands in
'git-update-ref' in the future.

Note that a dangling symref update does not record a new reflog entry,
which is unchanged before and after this commit.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07 08:51:49 -07:00
Karthik Nayak e9965ba477 refs: move original_update_refname to 'refs.c'
The files backend and the reftable backend implement
`original_update_refname` to obtain the original refname of the update.
Move it out to 'refs.c' and only expose it internally to the refs
library. This will be used in an upcoming commit to also introduce
another common functionality for the two backends.

We also rename the function to `ref_update_original_update_refname` to
keep it consistent with the upcoming other 'ref_update_*' functions
that'll be introduced.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07 08:51:49 -07:00
Karthik Nayak 57d0b1e2ea files-backend: extract out create_symref_lock()
The function `create_symref_locked()` creates a symref by creating a
'<symref>.lock' file and then committing the symref lock, which creates
the final symref.

Extract the early half of `create_symref_locked()` into a new helper
function `create_symref_lock()`. Because the name of the new function is
too similar to the original, rename the original to
`create_and_commit_symref()` to avoid confusion.

The new function `create_symref_locked()` can be used to create the
symref lock in a separate step from that of committing it. This allows
to add transactional support for symrefs, where the lock would be
created in the preparation step and the lock would be committed in the
finish step.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07 08:51:49 -07:00