We want ref_stores to be polymorphic, so invent a base class of which
files_ref_store is a derived class. For now there is exactly one
ref_store for the main repository and one for any submodules whose
references have been accessed.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a `struct ref_storage_be` to represent types of reference stores. In
OO notation, this is the class, and will soon hold some class
methods (e.g., a factory to create new ref_store instances) and will
also serve as the vtable for ref_store instances of that type.
As yet, the backends cannot do anything.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The greater goal of this patch series is to develop the concept of a
reference store, which is a place that references, their values, and
their reflogs are stored, and to virtualize the reference interface so
that different types of ref_stores can be implemented. We will then, for
example, use ref_store instances to access submodule references and
worktree references.
Currently, we keep a ref_cache for each submodule that has had its
references iterated over. It is a far cry from a ref_store, but they are
stored the way we will want to store ref_stores, and ref_stores will
eventually have to hold the reference caches. So let's treat ref_caches
as embryo ref_stores, and build them out from there.
As the first step, simply rename `ref_cache` to `files_ref_store`, and
rename some functions and attributes correspondingly.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow references with reflogs to be iterated over using a ref_iterator.
The latter is implemented as a files_reflog_iterator, which in turn uses
dir_iterator to read the "logs" directory.
Note that reflog iteration doesn't correctly handle per-worktree
reflogs (either before or after this patch).
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If there is a file under "$GIT_DIR/logs" with no corresponding
reference, the old code was emitting an error message, aborting the
reflog iteration, and returning -1. But
* None of the callers was checking the exit value
* The callers all want to find all legitimate reflogs (sometimes for the
purpose of determining object reachability!) and wouldn't benefit from
a truncated iteration anyway.
So instead, emit an error message and skip the "broken" reflog, but
continue with the iteration.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the reference iterator interface to implement do_for_each_ref().
Delete a bunch of code supporting the old for_each_ref() implementation.
And now that do_for_each_ref() is generic code (it is no longer tied to
the files backend), move it to refs.c.
The implementation is via a new function, do_for_each_ref_iterator(),
which takes a reference iterator as argument and calls a callback
function for each of the references in the iterator.
This change requires the current_ref performance hack for peel_ref() to
be implemented via ref_iterator_peel() rather than peel_entry() because
we don't have a ref_entry handy (it is hidden under three layers:
file_ref_iterator, merge_ref_iterator, and cache_ref_iterator). So:
* do_for_each_ref_iterator() records the active iterator in
current_ref_iter while it is running.
* peel_ref() checks whether current_ref_iter is pointing at the
requested reference. If so, it asks the iterator to peel the
reference (which it can do efficiently via its "peel" virtual
function). For extra safety, we do the optimization only if the
refname *addresses* are the same, not only if the refname *strings*
are the same, to forestall possible mixups between refnames that come
from different ref_iterators.
Please note that this optimization of peel_ref() is only available when
iterating via do_for_each_ref_iterator() (including all of the
for_each_ref() functions, which call it indirectly). It would be
complicated to implement a similar optimization when iterating directly
using a reference iterator, because multiple reference iterators can be
in use at the same time, with interleaved calls to
ref_iterator_advance(). (In fact we do exactly that in
merge_ref_iterator.)
But that is not necessary. peel_ref() is only called while iterating
over references. Callers who iterate using the for_each_ref() functions
benefit from the optimization described above. Callers who iterate using
reference iterators directly have access to the ref_iterator, so they
can call ref_iterator_peel() themselves to get an analogous optimization
in a more straightforward manner.
If we rewrite all callers to use the reference iteration API, then we
can remove the current_ref_iter hack permanently.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, the API for iterating over references is via a family of
for_each_ref()-type functions that invoke a callback function for each
selected reference. All of these eventually call do_for_each_ref(),
which knows how to do one thing: iterate in parallel through two
ref_caches, one for loose and one for packed refs, giving loose
references precedence over packed refs. This is rather complicated code,
and is quite specialized to the files backend. It also requires callers
to encapsulate their work into a callback function, which often means
that they have to define and use a "cb_data" struct to manage their
context.
The current design is already bursting at the seams, and will become
even more awkward in the upcoming world of multiple reference storage
backends:
* Per-worktree vs. shared references are currently handled via a kludge
in git_path() rather than iterating over each part of the reference
namespace separately and merging the results. This kludge will cease
to work when we have multiple reference storage backends.
* The current scheme is inflexible. What if we sometimes want to bypass
the ref_cache, or use it only for packed or only for loose refs? What
if we want to store symbolic refs in one type of storage backend and
non-symbolic ones in another?
In the future, each reference backend will need to define its own way of
iterating over references. The crux of the problem with the current
design is that it is impossible to compose for_each_ref()-style
iterations, because the flow of control is owned by the for_each_ref()
function. There is nothing that a caller can do but iterate through all
references in a single burst, so there is no way for it to interleave
references from multiple backends and present the result to the rest of
the world as a single compound backend.
This commit introduces a new iteration primitive for references: a
ref_iterator. A ref_iterator is a polymorphic object that a reference
storage backend can be asked to instantiate. There are three functions
that can be applied to a ref_iterator:
* ref_iterator_advance(): move to the next reference in the iteration
* ref_iterator_abort(): end the iteration before it is exhausted
* ref_iterator_peel(): peel the reference currently being looked at
Iterating using a ref_iterator leaves the flow of control in the hands
of the caller, which means that ref_iterators from multiple
sources (e.g., loose and packed refs) can be composed and presented to
the world as a single compound ref_iterator.
It also means that the backend code for implementing reference iteration
will sometimes be more complicated. For example, the
cache_ref_iterator (which iterates over a ref_cache) can't use the C
stack to recurse; instead, it must manage its own stack internally as
explicit data structures. There is also a lot of boilerplate connected
with object-oriented programming in C.
Eventually, end-user callers will be able to be written in a more
natural way—managing their own flow of control rather than having to
work via callbacks. Since there will only be a few reference backends
but there are many consumers of this API, this is a good tradeoff.
More importantly, we gain composability, and especially the possibility
of writing interchangeable parts that can work with any ref_iterator.
For example, merge_ref_iterator implements a generic way of merging the
contents of any two ref_iterators. It is used to merge loose + packed
refs as part of the implementation of the files_ref_iterator. But it
will also be possible to use it to merge other pairs of reference
sources (e.g., per-worktree vs. shared refs).
Another example is prefix_ref_iterator, which can be used to trim a
prefix off the front of reference names before presenting them to the
caller (e.g., "refs/heads/master" -> "master").
In this patch, we introduce the iterator abstraction and many utilities,
and implement a reference iterator for the files ref storage backend.
(I've written several other obvious utilities, for example a generic way
to filter references being iterated over. These will probably be useful
in the future. But they are not needed for this patch series, so I am
not including them at this time.)
In a moment we will rewrite do_for_each_ref() to work via reference
iterators (allowing some special-purpose code to be discarded), and do
something similar for reflogs. In future patch series, we will expose
the ref_iterator abstraction in the public refs API so that callers can
use it directly.
Implementation note: I tried abstracting this a layer further to allow
generic iterators (over arbitrary types of objects) and generic
utilities like a generic merge_iterator. But the implementation in C was
very cumbersome, involving (in my opinion) too much boilerplate and too
much unsafe casting, some of which would have had to be done on the
caller side. However, I did put a few iterator-related constants in a
top-level header file, iterator.h, as they will be useful in a moment to
implement iteration over directory trees and possibly other types of
iterators in the future.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract new function ref_resolves_to_object() from
entry_resolves_to_object(). It can be used even if there is no ref_entry
at hand.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Free up the old name for a more general purpose.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If there is not a nonbare repository where a submodule is supposedly
located, then don't instantiate a ref_cache for it.
The analogous check can be removed from resolve_gitlink_ref().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will be useful for passing REF_NODEREF through.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the context of the for_each_ref() functions, call the prefix that
references must start with "prefix". (In some places it was called
"base".) This is clearer, and also prevents confusion with another
planned use of the word "base".
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now lock_ref_sha1_basic() is only called with flags==REF_NODEREF. So we
don't have to handle other cases anymore.
This enables several simplifications, the most interesting of which come
from the fact that ref_lock::orig_ref_name is now always the same as
ref_lock::ref_name:
* Remove ref_lock::orig_ref_name
* Remove local variable orig_refname from lock_ref_sha1_basic()
* ref_name can be initialize once and its value reused
* commit_ref_update() never has to write to the reflog for
lock->orig_ref_name
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
If a transaction includes a non-NODEREF update to a symbolic reference,
we don't have to look it up in lock_ref_for_update(). The reference will
be dereferenced anyway when the split-off update is processed.
This change requires that we store a backpointer from the split-off
update to its parent update, for two reasons:
* We still want to report the original reference name in error messages.
So if an error occurs when checking the split-off update's old_sha1,
walk the parent_update pointers back to find the original reference
name, and report that one.
* We still need to write the old_sha1 of the symref to its reflog. So
after we read the split-off update's reference value, walk the
parent_update pointers back and fill in their old_sha1 fields.
Aside from eliminating unnecessary reads, this change fixes a
subtle (though not very serious) race condition: in the old code, the
old_sha1 of the symref was resolved before the reference that it pointed
at was locked. So it was possible that the old_sha1 value logged to the
symref's reflog could be wrong if another process changed the downstream
reference before it was locked.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Before the previous patch, our first read of the reference happened
before the reference was locked, so we couldn't trust its value and had
to read it again. But now that our first read of the reference happens
after acquiring the lock, there is no need to read it a second time. So
move the read_ref_full() call into the (update->type & REF_ISSYMREF)
block.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Before committing ref updates, split symbolic ref updates into two
parts: an update to the underlying ref, and a log-only update to the
symbolic ref. This ensures that both references are locked correctly
during the transaction, including while their reflogs are updated.
Similarly, if the reference pointed to by HEAD is modified directly, add
a separate log-only update to HEAD, rather than leaving the job of
updating HEAD's reflog to commit_ref_update(). This change ensures that
HEAD is locked correctly while its reflog is being modified, as well as
being cheaper (HEAD only needs to be resolved once).
This makes use of a new function, lock_raw_ref(), which is analogous to
read_raw_ref(), but acquires a lock on the reference before reading it.
This change still has two problems:
* There are redundant read_ref_full() reference lookups.
* It is still possible to get incorrect reflogs for symbolic references
if there is a concurrent update by another process, since the old_oid
of a symref is determined before the lock on the pointed-to ref is
held.
Both problems will soon be fixed.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
WIP
When renaming refs, don't dereference either the origin or the destination
before renaming.
The origin does not need to be dereferenced because it is presently
forbidden to rename symbolic refs.
Not dereferencing the destination fixes a bug where renaming on top of
a broken symref would use the pointed-to ref name for the moved
reflog.
Add a test for the reflog bug.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
The refs infrastructure learns about log-only ref updates, which only
update the reflog. Later, we will use this to separate symbolic
reference resolution from ref updating.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
It is nonsensical (and a little bit dangerous) to use REF_ISPRUNING
without REF_NODEREF. Forbid it explicitly. Change the one REF_ISPRUNING
caller to pass REF_NODEREF too.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
* Always start error messages with a lower-case letter.
* Always enclose reference names in single quotes.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
resolve_ref_unsafe() can cope with being called with NULL passed to its
flags argument. So lock_ref_sha1_basic() can just hand its own type
parameter through.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Among other things, document the (important!) requirement that input
refname be checked for safety before calling this function.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
This will hopefully reduce confusion with the "flags" arguments that are
used in many functions in this module as an input parameter to choose
how the function should operate.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
These microoptimizations don't make a significant difference in speed.
And they cause problems if somebody ever wants to modify the function to
add updates to a transaction as part of processing it, as will happen
shortly.
Make the same changes in initial_ref_transaction_commit().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Even if there is an empty directory where we look for the loose version
of a reference, check for a packed reference before giving up. This
fixes the failing test that was introduced two commits ago.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Part of the bug revealed in the last commit is that resolve_ref_unsafe()
incorrectly returns EISDIR if it finds a directory in the place where it
is looking for a loose reference, even if the corresponding packed
reference exists. lock_ref_sha1_basic() notices the bogus EISDIR, and
use it as an indication that it should call remove_empty_directories()
and call resolve_ref_unsafe() again.
But resolve_ref_unsafe() shouldn't report EISDIR in this case. If we
would simply make that change, then remove_empty_directories() wouldn't
get called anymore, and the empty directory would get in the way when
commit_ref() calls commit_lock_file() to rename the lockfile into place.
So instead of relying on lock_ref_sha1_basic() to delete empty
directories, teach commit_ref(), just before calling commit_lock_file(),
to check whether a directory is in the way, and if so, try to delete it.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Code restructuring around the "refs" area to prepare for pluggable
refs backends.
* dt/pre-refs-backend: (24 commits)
refs: on symref reflog expire, lock symref not referrent
refs: move resolve_ref_unsafe into common code
show_head_ref(): check the result of resolve_ref_namespace()
check_aliased_update(): check that dst_name is non-NULL
checkout_paths(): remove unneeded flag variable
cmd_merge(): remove unneeded flag variable
fsck_head_link(): remove unneeded flag variable
read_raw_ref(): change flags parameter to unsigned int
files-backend: inline resolve_ref_1() into resolve_ref_unsafe()
read_raw_ref(): manage own scratch space
files-backend: break out ref reading
resolve_ref_1(): eliminate local variable "bad_name"
resolve_ref_1(): reorder code
resolve_ref_1(): eliminate local variable
resolve_ref_unsafe(): ensure flags is always set
resolve_ref_unsafe(): use for loop to count up to MAXDEPTH
resolve_missing_loose_ref(): simplify semantics
t1430: improve test coverage of deletion of badly-named refs
t1430: test for-each-ref in the presence of badly-named refs
t1430: don't rely on symbolic-ref for creating broken symrefs
...
When locking a symbolic ref to expire a reflog, lock the symbolic
ref (using REF_NODEREF) instead of its referent.
Add a test for this.
Signed-off-by: David Turner <dturner@twopensource.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that resolve_ref_unsafe's only interaction with the backend is
through read_raw_ref, we can move it into the common code. Later,
we'll replace read_raw_ref with a backend function.
Signed-off-by: David Turner <dturner@twopensource.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
read_raw_ref() is going to be part of the vtable for reference backends,
so clean up its interface to use "unsigned int flags" rather than "int
flags". Its caller still uses signed int for its flags arguments. But
changing that would touch a lot of code, so leave it for now.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
resolve_ref_unsafe() wasn't doing anything useful anymore.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of creating scratch space in resolve_ref_unsafe() and passing
it down through resolve_ref_1 to read_raw_ref(), teach read_raw_ref()
to manage its own scratch space. This reduces coupling across the
functions at the cost of some extra allocations.
Also, when read_raw_ref() is implemented for different reference
backends, the other implementations might have different scratch
space requirements.
Note that we now preserve errno across the calls to strbuf_release(),
which calls free() and can thus theoretically overwrite errno.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor resolve_ref_1 in terms of a new function read_raw_ref, which
is responsible for reading ref data from the ref storage.
Later, we will make read_raw_ref a pluggable backend function, and make
resolve_ref_unsafe common.
Signed-off-by: David Turner <dturner@twopensource.com>
Helped-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We can use (*flags & REF_BAD_NAME) for that purpose.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no need to adjust *flags if we're just about to fail.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In place of `buf`, use `refname`, which is anyway a better description
of what is being pointed at.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the caller passes flags==NULL, then set it to point at a local
scratch variable. This removes the need for a lot of "if (flags)"
guards in resolve_ref_1() and resolve_missing_loose_ref().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The loop's there anyway; we might as well use it.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>