Add some documentation for the logic behind the conflict normalization
in rerere.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git rerere' is considered a porcelain command and as such its output
should be translated. Its functionality is also only enabled through
a config setting, so scripts really shouldn't rely on the output
either way.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The conversion to pass "the_repository" and then "a_repository"
throughout the object access API continues.
* sb/object-store-grafts:
commit: allow lookup_commit_graft to handle arbitrary repositories
commit: allow prepare_commit_graft to handle arbitrary repositories
shallow: migrate shallow information into the object parser
path.c: migrate global git_path_* to take a repository argument
cache: convert get_graft_file to handle arbitrary repositories
commit: convert read_graft_file to handle arbitrary repositories
commit: convert register_commit_graft to handle arbitrary repositories
commit: convert commit_graft_pos() to handle arbitrary repositories
shallow: add repository argument to is_repository_shallow
shallow: add repository argument to check_shallow_file_for_update
shallow: add repository argument to register_shallow
shallow: add repository argument to set_alternate_shallow_file
commit: add repository argument to lookup_commit_graft
commit: add repository argument to prepare_commit_graft
commit: add repository argument to read_graft_file
commit: add repository argument to register_commit_graft
commit: add repository argument to commit_graft_pos
object: move grafts to object parser
object-store: move object access functions to object-store.h
It looks like most paths in the output in the git codebase are wrapped
in single quotes. Standardize on that in rerere as well.
Apart from being more consistent, this also makes some of the strings
match strings that are already translated in other parts of the
codebase, thus reducing the work for translators, when the strings are
marked for translation in a subsequent commit.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Documentation/CodingGuidelines mentions that error messages should be
lowercase. Prior to marking them for translation follow that pattern
in rerere as well, so translators won't have to translate messages
that don't conform to our guidelines.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have multiple different variants of the error message we show to
the user if 'read_cache' fails. The "Could not read index" variant we
are using in 'rerere.c' is currently not used anywhere in translated
form.
As a subsequent commit will mark all output that comes from 'rerere.c'
for translation, make the life of the translators a little bit easier
by using a string that is used elsewhere, and marked for translation
there, and thus most likely already translated.
"index file corrupt" seems to be the most common error message we show
when 'read_cache' fails, so use that here as well.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up to adjust to a more recent lockfile API convention that
allows lockfile instances kept on the stack.
* ma/lockfile-cleanup:
lock_file: move static locks into functions
lock_file: make function-local locks non-static
refs.c: do not die if locking fails in `delete_pseudoref()`
refs.c: do not die if locking fails in `write_pseudoref()`
t/helper/test-write-cache: clean up lock-handling
Migrate all git_path_* functions that are defined in path.c to take a
repository argument. Unlike other patches in this series, do not use the
#define trick, as we rewrite the whole function, which is rather small.
This doesn't migrate all the functions, as other builtins have their own
local path functions defined using GIT_PATH_FUNC. So keep that macro
around to serve the other locations.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This should make these functions easier to find and cache.h less
overwhelming to read.
In particular, this moves:
- read_object_file
- oid_object_info
- write_object_file
As a result, most of the codebase needs to #include object-store.h.
In this patch the #include is only added to files that would fail to
compile otherwise. It would be better to #include wherever
identifiers from the header are used. That can happen later
when we have better tooling for it.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Placing `struct lock_file`s on the stack used to be a bad idea, because
the temp- and lockfile-machinery would keep a pointer into the struct.
But after 076aa2cbd (tempfile: auto-allocate tempfiles on heap,
2017-09-05), we can safely have lockfiles on the stack. (This applies
even if a user returns early, leaving a locked lock behind.)
Each of these `struct lock_file`s is used from within a single function.
Move them into the respective functions to make the scope clearer and
drop the staticness.
For good measure, I have inspected these sites and come to believe that
they always release the lock, with the possible exception of bailing out
using `die()` or `exit()` or by returning from a `cmd_foo()`.
As pointed out by Jeff King, it would be bad if someone held on to a
`struct lock_file *` for some reason. After some grepping, I agree with
his findings: no-one appears to be doing that.
After this commit, the remaining occurrences of "static struct
lock_file" are locks that are used from within different functions. That
is, they need to remain static. (Short of more intrusive changes like
passing around pointers to non-static locks.)
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Conversion from uchar[20] to struct object_id continues.
* bc/object-id: (36 commits)
convert: convert to struct object_id
sha1_file: introduce a constant for max header length
Convert lookup_replace_object to struct object_id
sha1_file: convert read_sha1_file to struct object_id
sha1_file: convert read_object_with_reference to object_id
tree-walk: convert tree entry functions to object_id
streaming: convert istream internals to struct object_id
tree-walk: convert get_tree_entry_follow_symlinks internals to object_id
builtin/notes: convert static functions to object_id
builtin/fmt-merge-msg: convert remaining code to object_id
sha1_file: convert sha1_object_info* to object_id
Convert remaining callers of sha1_object_info_extended to object_id
packfile: convert unpack_entry to struct object_id
sha1_file: convert retry_bad_packed_offset to struct object_id
sha1_file: convert assert_sha1_type to object_id
builtin/mktree: convert to struct object_id
streaming: convert open_istream to use struct object_id
sha1_file: convert check_sha1_signature to struct object_id
sha1_file: convert read_loose_object to use struct object_id
builtin/index-pack: convert struct ref_delta_entry to object_id
...
Convert read_sha1_file to take a pointer to struct object_id and rename
it read_object_file. Do the same for read_sha1_file_extended.
Convert one use in grep.c to use the new function without any other code
change, since the pointer being passed is a void pointer that is already
initialized with a pointer to struct object_id. Update the declaration
and definitions of the modified functions, and apply the following
semantic patch to convert the remaining callers:
@@
expression E1, E2, E3;
@@
- read_sha1_file(E1.hash, E2, E3)
+ read_object_file(&E1, E2, E3)
@@
expression E1, E2, E3;
@@
- read_sha1_file(E1->hash, E2, E3)
+ read_object_file(E1, E2, E3)
@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1.hash, E2, E3, E4)
+ read_object_file_extended(&E1, E2, E3, E4)
@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1->hash, E2, E3, E4)
+ read_object_file_extended(E1, E2, E3, E4)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have several callers like
if (active_cache_changed && write_locked_index(...))
handle_error();
rollback_lock_file(...);
where the final rollback is needed because "!active_cache_changed"
shortcuts the if-expression. There are also a few variants of this,
including some if-else constructs that make it more clear when the
explicit rollback is really needed.
Teach `write_locked_index()` to take a new flag SKIP_IF_UNCHANGED and
simplify the callers. Leave the most complicated of the callers (in
builtin/update-index.c) unchanged. Rewriting it to use this new flag
would end up duplicating logic.
We could have made the new flag behave the other way round
("FORCE_WRITE"), but that could break existing users behind their backs.
Let's take the more conservative approach. We can still migrate existing
callers to use our new flag. Later we might even be able to flip the
default, possibly without entirely ignoring the risk to in-flight or
out-of-tree topics.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the helper macro MOVE_ARRAY to move arrays. This is shorter and
safer, as it automatically infers the size of elements.
Patch generated by Coccinelle and contrib/coccinelle/array.cocci in
Travis CI's static analysis build job.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many codepaths did not diagnose write failures correctly when disks
go full, due to their misuse of write_in_full() helper function,
which have been corrected.
* jk/write-in-full-fix:
read_pack_header: handle signed/unsigned comparison in read result
config: flip return value of store_write_*()
notes-merge: use ssize_t for write_in_full() return value
pkt-line: check write_in_full() errors against "< 0"
convert less-trivial versions of "write_in_full() != len"
avoid "write_in_full(fd, buf, len) != len" pattern
get-tar-commit-id: check write_in_full() return against 0
config: avoid "write_in_full(fd, buf, len) < len" pattern
The return value of write_in_full() is either "-1", or the
requested number of bytes[1]. If we make a partial write
before seeing an error, we still return -1, not a partial
value. This goes back to f6aa66cb95 (write_in_full: really
write in full or return error on disk full., 2007-01-11).
So checking anything except "was the return value negative"
is pointless. And there are a couple of reasons not to do
so:
1. It can do a funny signed/unsigned comparison. If your
"len" is signed (e.g., a size_t) then the compiler will
promote the "-1" to its unsigned variant.
This works out for "!= len" (unless you really were
trying to write the maximum size_t bytes), but is a
bug if you check "< len" (an example of which was fixed
recently in config.c).
We should avoid promoting the mental model that you
need to check the length at all, so that new sites are
not tempted to copy us.
2. Checking for a negative value is shorter to type,
especially when the length is an expression.
3. Linus says so. In d34cf19b89 (Clean up write_in_full()
users, 2007-01-11), right after the write_in_full()
semantics were changed, he wrote:
I really wish every "write_in_full()" user would just
check against "<0" now, but this fixes the nasty and
stupid ones.
Appeals to authority aside, this makes it clear that
writing it this way does not have an intentional
benefit. It's a historical curiosity that we never
bothered to clean up (and which was undoubtedly
cargo-culted into new sites).
So let's convert these obviously-correct cases (this
includes write_str_in_full(), which is just a wrapper for
write_in_full()).
[1] A careful reader may notice there is one way that
write_in_full() can return a different value. If we ask
write() to write N bytes and get a return value that is
_larger_ than N, we could return a larger total. But
besides the fact that this would imply a totally broken
version of write(), it would already invoke undefined
behavior. Our internal remaining counter is an unsigned
size_t, which means that subtracting too many byte will
wrap it around to a very large number. So we'll instantly
begin reading off the end of the buffer, trying to write
gigabytes (or petabytes) of data.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These two configuration variables are described in the documentation
to take an expiry period expressed in the number of days:
gc.rerereResolved::
Records of conflicted merge you resolved earlier are
kept for this many days when 'git rerere gc' is run.
The default is 60 days.
gc.rerereUnresolved::
Records of conflicted merge you have not resolved are
kept for this many days when 'git rerere gc' is run.
The default is 15 days.
There is no strong reason not to allow a more general "approxidate"
expiry specification, e.g. "5.days.ago", or "never".
Rename the config_get_expiry() helper introduced in the previous
step to git_config_get_expiry_in_days() and move it to a more
generic place, config.c, and use date.c::parse_expiry_date() to do
so. Give it an ability to allow the caller to tell among three
cases (i.e. there is no "gc.rerereResolved" config, there is and it
is correctly parsed into the *expiry variable, and there was an
error in parsing the given value). The current caller can work
correctly without using the return value, though.
In the future, we may find other variables that only allow an
integer that specifies "this many days" or other unit of time, and
when it happens we may need to drop "_days" suffix from the name of
the function and instead pass the "scale" value as another parameter.
But this will do for now.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The two configuration variables, gc.rerereResolved and
gc.rerereUnresolved, are measured in days and are passed as such
into the prune_one() helper function, which worked in time_t to see
if an entry in the rerere database is past its expiry.
Instead, have the caller turn the number of days into the expiry
timestamp. Further, use timestamp_t instead of time_t. This will
make it possible to extend the way the configuration variable is
spelled by using date.c::parse_expiry_date() that gives the expiry
timestamp in timestamp_t.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A common pattern to free a piece of memory and assign NULL to the
pointer that used to point at it has been replaced with a new
FREE_AND_NULL() macro.
* ab/free-and-null:
*.[ch] refactoring: make use of the FREE_AND_NULL() macro
coccinelle: make use of the "expression" FREE_AND_NULL() rule
coccinelle: add a rule to make "expression" code use FREE_AND_NULL()
coccinelle: make use of the "type" FREE_AND_NULL() rule
coccinelle: add a rule to make "type" code use FREE_AND_NULL()
git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
Replace occurrences of `free(ptr); ptr = NULL` which weren't caught by
the coccinelle rule. These fall into two categories:
- free/NULL assignments one after the other which coccinelle all put
on one line, which is functionally equivalent code, but very ugly.
- manually spotted occurrences where the NULL assignment isn't right
after the free() call.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop including config.h by default in cache.h. Instead only include
config.h in those files which require use of the config system.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We are supposed to report errno from fopen(). fclose() between fopen()
and the report function could either change errno or reset it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fopen() returns NULL, it could be because the given path does not
exist, but it could also be some other errors and the caller has to
check. Add a wrapper so we don't have to repeat the same error check
everywhere.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Callers of the hold_locked_index() function pass 0 when they want to
prepare to write a new version of the index file without wishing to
die or emit an error message when the request fails (e.g. somebody
else already held the lock), and pass 1 when they want the call to
die upon failure.
This option is called LOCK_DIE_ON_ERROR by the underlying lockfile
API, and the hold_locked_index() function translates the paramter to
LOCK_DIE_ON_ERROR when calling the hold_lock_file_for_update().
Replace these hardcoded '1' with LOCK_DIE_ON_ERROR and stop
translating. Callers other than the ones that are replaced with
this change pass '0' to the function; no behaviour change is
intended with this patch.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
Among the callers of hold_locked_index() that passes 0:
- diff.c::refresh_index_quietly() at the end of "git diff" is an
opportunistic update; it leaks the lockfile structure but it is
just before the program exits and nobody should care.
- builtin/describe.c::cmd_describe(),
builtin/commit.c::cmd_status(),
sequencer.c::read_and_refresh_cache() are all opportunistic
updates and they are OK.
- builtin/update-index.c::cmd_update_index() takes a lock upfront
but we may end up not needing to update the index (i.e. the
entries may be fully up-to-date), in which case we do not need to
issue an error upon failure to acquire the lock. We do diagnose
and die if we indeed need to update, so it is OK.
- wt-status.c::require_clean_work_tree() IS BUGGY. It asks
silence, does not check the returned value. Compare with
callsites like cmd_describe() and cmd_status() to notice that it
is wrong to call update_index_if_able() unconditionally.
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:
@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash
@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code for warning_errno/die_errno has been refactored and a new
error_errno() reporting helper is introduced.
* nd/error-errno: (41 commits)
wrapper.c: use warning_errno()
vcs-svn: use error_errno()
upload-pack.c: use error_errno()
unpack-trees.c: use error_errno()
transport-helper.c: use error_errno()
sha1_file.c: use {error,die,warning}_errno()
server-info.c: use error_errno()
sequencer.c: use error_errno()
run-command.c: use error_errno()
rerere.c: use error_errno() and warning_errno()
reachable.c: use error_errno()
mailmap.c: use error_errno()
ident.c: use warning_errno()
http.c: use error_errno() and warning_errno()
grep.c: use error_errno()
gpg-interface.c: use error_errno()
fast-import.c: use error_errno()
entry.c: use error_errno()
editor.c: use error_errno()
diff-no-index.c: use error_errno()
...
"git rerere" can encounter two or more files with the same conflict
signature that have to be resolved in different ways, but there was
no way to record these separate resolutions.
* jc/rerere-multi:
rerere: adjust 'forget' to multi-variant world order
rerere: split code to call ll_merge() further
rerere: move code related to "forget" together
rerere: gc and clear
rerere: do use multiple variants
t4200: rerere a merge with two identical conflicts
rerere: allow multiple variants to exist
rerere: delay the recording of preimage
rerere: handle leftover rr-cache/$ID directory and postimage files
rerere: scan $GIT_DIR/rr-cache/$ID when instantiating a rerere_id
rerere: split conflict ID further
Because conflicts with the same contents inside conflict blocks
enclosed by "<<<<<<<" and ">>>>>>>" can now have multiple variants
to help three-way merge to adjust to the differences outside the
conflict blocks, "rerere forget $path" needs to be taught that there
may be multiple recorded resolutions that share the same conflict
hash (which groups the conflicts with "the same contents inside
conflict blocks"), among which there are some that would not be
relevant to the conflict we are looking at. These "other variants"
that happen to share the same conflict hash should not be cleared,
and the variant that would apply to the current conflict may not be
the zero-th one (which is the only one that is cleared by the
current code).
After finding the conflict hash, iterate over the existing variants
and try to resolve the conflict using each of them to find the one
that "cleanly" resolves the current conflict. That is the one we
want to forget and record the preimage for, so that the user can
record the corrected resolution.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The merge() helper function is given an existing rerere ID (i.e. the
name of the .git/rr-cache/* subdirectory, and the variant number)
that identifies one <preimage, postimage> pair, try to see if the
conflicted state in the given path can be resolved by using the pair,
and if this succeeds, then update the conflicted path with the
result in the working tree.
To implement rerere_forget() in the multiple variant world, we'd
need a helper to do the "see if a <preimage, postimage> pair cleanly
resolves a conflicted state we have in-core" part, without actually
touching any file in the working tree, in order to identify which
variant(s) to remove. Split the logic to do so into a separate
helper function try_merge() out of merge().
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"rerere forget" is the only user of handle_cache() helper, which in
turn is the only user of rerere_io that reads from an in-core buffer
whose getline method is implemented as rerere_mem_getline(). Gather
them together.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adjust "git rerere gc" and "git rerere clear" to the new world order
with rerere database with multiple variants for the same shape of
conflicts.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This enables the multiple-variant support for real. Multiple
conflicts of the same shape can have differences in contexts where
they appear, interfering the replaying of recorded resolution of one
conflict to another, and in such a case, their resolutions are
recorded as different variants under the same conflict ID.
We still need to adjust garbage collection codepaths for this
change, but the basic "replay" functionality is functional with
this change.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The shape of the conflict in a path determines the conflict ID. The
preimage and postimage pair that was recorded for the conflict ID
previously may or may not replay well for the conflict we just saw.
Currently, we punt when the previous resolution does not cleanly
replay, but ideally we should then be able to record the currently
conflicted path by assigning a new 'variant', and then record the
resolution the user is going to make.
Introduce a mechanism to have more than one variant for a given
conflict ID; we do not actually assign any variant other than 0th
variant yet at this step.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We record the preimage only when there is no directory to record the
conflict we encountered, i.e. when $GIT_DIR/rr-cache/$ID does not
exist. As the plan is to allow multiple <preimage,postimage> pairs
as variants for the same conflict ID eventually, this logic needs to
go.
As the first step in that direction, stop the "did we create the
directory? Then we record the preimage" logic. Instead, we record
if a preimage does not exist when we saw a conflict in a path. Also
make sure that we remove a stale postimage, which most likely is
totally unrelated to the resolution of this new conflict, when we
create a new preimage under $ID when $GIT_DIR/rr-cache/$ID already
exists.
In later patches, we will further update this logic to be "do we
have <preimage,postimage> pair that cleanly resolve the current
conflicts? If not, record a new preimage as a new variant", but
that does not happen at this stage yet.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If by some accident there is only $GIT_DIR/rr-cache/$ID directory
existed, we wouldn't have recorded a preimage for a conflict that
is newly encountered, which would mean after a manual resolution,
we wouldn't have recorded it by storing the postimage, because the
logic used to be "if there is no rr-cache/$ID directory, then we are
the first so record the preimage". Instead, record preimage if we
do not have one.
In addition, if there is only $GIT_DIR/rr-cache/$ID/postimage
without corresponding preimage, we would have tried to call into
merge() and punted.
These would have been a situation frustratingly hard to recover
from.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some calls to strcpy(3) triggers a false warning from static
analysers that are less intelligent than humans, and reducing the
number of these false hits helps us notice real issues. A few
calls to strcpy(3) in "git rerere" that are already safe has been
rewritten to avoid false wanings.
* jk/rerere-xsnprintf:
rerere: replace strcpy with xsnprintf
This will help fixing bootstrap corner-case issues, e.g. having an
empty $GIT_DIR/rr-cache/$ID directory would fail to record a
preimage, in later changes in this series.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The plan is to keep assigning the backward compatible conflict ID
based on the hash of the (normalized) text of conflicts, keep using
that conflict ID as the directory name under $GIT_DIR/rr-cache/, but
allow each conflicted path to use a separate "variant" to record
resolutions, i.e. having more than one <preimage,postimage> pairs
under $GIT_DIR/rr-cache/$ID/ directory. As the first step in that
direction, separate the shared "conflict ID" out of the rerere_id
structure.
The plan is to keep information per $ID in rerere_dir, that can be
shared among rerere_id that is per conflicted path.
When we are done with rerere(), which can be directly called from
other programs like "git apply", "git commit" and "git merge", the
shared rerere_dir structures can be freed entirely, so they are not
reference-counted and they are not freed when we release rerere_id's
that reference them.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This shouldn't overflow, as we are copying a sha1 hex into a
41-byte buffer. But it does not hurt to use a bound-checking
function, which protects us and makes auditing for overflows
easier.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up and minor fixes.
* jc/rerere: (21 commits)
rerere: un-nest merge() further
rerere: use "struct rerere_id" instead of "char *" for conflict ID
rerere: call conflict-ids IDs
rerere: further clarify do_rerere_one_path()
rerere: further de-dent do_plain_rerere()
rerere: refactor "replay" part of do_plain_rerere()
rerere: explain the remainder
rerere: explain "rerere forget" codepath
rerere: explain the primary codepath
rerere: explain MERGE_RR management helpers
rerere: fix benign off-by-one non-bug and clarify code
rerere: explain the rerere I/O abstraction
rerere: do not leak mmfile[] for a path with multiple stage #1 entries
rerere: stop looping unnecessarily
rerere: drop want_sp parameter from is_cmarker()
rerere: report autoupdated paths only after actually updating them
rerere: write out each record of MERGE_RR in one go
rerere: lift PATH_MAX limitation
rerere: plug conflict ID leaks
rerere: handle conflicts with multiple stage #1 entries
...
There's a bug in builtin/am.c in which we take a lock on
MERGE_RR recursively. But rather than fix am.c, this patch
fixes the confusing interface from rerere.c that caused the
bug. Read on for the gory details.
The setup_rerere() function both reads the existing MERGE_RR
file, and takes MERGE_RR.lock. In the rerere() and
rerere_forget() functions, we end up in write_rr(), which
will then commit the lock file.
But for functions like rerere_clear() that do not write to
MERGE_RR, we expect the caller to have handled
setup_rerere(). That caller would then need to release the
lockfile, but it can't; the lock struct is local to
rerere.c.
For builtin/rerere.c, this is OK. We run a single rerere
operation and then exit immediately, which has the side
effect of rolling back the lockfile.
But in builtin/am.c, this is actively wrong. If we run "git
am -3 --skip", we call setup-rerere twice without releasing
the lock:
1. The "--skip" causes us to call am_rerere_clear(), which
calls setup_rerere(), but never drops the lock.
2. We then proceed to the next patch.
3. The "--3way" may cause us to call rerere() to handle
conflicts in that patch, but we are already holding the
lock. The lockfile code dies with:
BUG: prepare_tempfile_object called for active object
We could fix this by having rerere_clear() call
rollback_lock_file(). But it feels a bit odd for it to roll
back a lockfile that it did not itself take. So let's
simplify the interface further, and handle setup_rerere in
the function itself, taking away the question from the
caller over whether they need to do so.
We can give rerere_gc() the same treatment, as well (even
though it doesn't have any callers besides builtin/rerere.c
at this point). Note that these functions don't take flags
from their callers to pass along to setup_rerere; that's OK,
because the flags would not be meaningful for what they are
doing.
Both of those functions need to hold the lock because even
though they do not write to MERGE_RR, they are still writing
and should be protected from a simultaneous "rerere" run.
But rerere_remaining(), "rerere diff", and "rerere status"
are all read-only operations. They want to setup_rerere(),
but do not care about taking the lock in the first place.
Since our update of MERGE_RR is the usual atomic rename done
by commit_lock_file, they can just do a lockless read. For
that, we teach setup_rerere a READONLY flag to avoid the
lock.
As a bonus, this pushes builtin/rerere.c's setup_rerere call
closer to the functions that use it. Which means that "git
rerere totally-bogus-command" will no longer silently
exit(0) in a repository without rerere enabled.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One of the most common uses of git_path() is to pass a
constant, like git_path("MERGE_MSG"). This has two
drawbacks:
1. The return value is a static buffer, and the lifetime
is dependent on other calls to git_path, etc.
2. There's no compile-time checking of the pathname. This
is OK for a one-off (after all, we have to spell it
correctly at least once), but many of these constant
strings appear throughout the code.
This patch introduces a series of functions to "memoize"
these strings, which are essentially globals for the
lifetime of the program. We compute the value once, take
ownership of the buffer, and return the cached value for
subsequent calls. cache.h provides a helper macro for
defining these functions as one-liners, and defines a few
common ones for global use.
Using a macro is a little bit gross, but it does nicely
document the purpose of the functions. If we need to touch
them all later (e.g., because we learned how to change the
git_dir variable at runtime, and need to invalidate all of
the stored values), it will be much easier to have the
complete list.
Note that the shared-global functions have separate, manual
declarations. We could do something clever with the macros
(e.g., expand it to a declaration in some places, and a
declaration _and_ a definition in path.c). But there aren't
that many, and it's probably better to stay away from
too-magical macros.
Likewise, if we abandon the C preprocessor in favor of
generating these with a script, we could get much fancier.
E.g., normalizing "FOO/BAR-BAZ" into "git_path_foo_bar_baz".
But the small amount of saved typing is probably not worth
the resulting confusion to readers who want to grep for the
function's definition.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By consistently using "upon failure, set 'ret' and jump to out"
pattern, flatten the function further.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This gives a thin abstraction between the conflict ID that is a hash
value obtained by inspecting the conflicts and the name of the
directory under $GIT_DIR/rr-cache/, in which the previous resolution
is recorded to be replayed. The plan is to make sure that the
presence of the directory does not imply the presense of a previous
resolution and vice-versa, and later allow us to have more than one
pair of <preimage, postimage> for a given conflict ID.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most places we call conflict IDs "name" and some others we call them
"hex"; update all of them to "id".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract the body of a loop that attempts to replay recorded
resolution for each conflicted path into a helper function, not
because I want to call it from multiple places later, but because
the logic has become too deeply nested and hard to read.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explain the internals of rerere as in-code comments, while
sprinkling "NEEDSWORK" comment to highlight iffy bits and
questionable assumptions.
This covers the codepath that implements "rerere gc" and "rerere
clear".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explain the internals of rerere as in-code comments, while
sprinkling "NEEDSWORK" comment to highlight iffy bits and
questionable assumptions.
This covers the codepath that implements "rerere forget".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explain the internals of rerere as in-code comments, while
sprinkling "NEEDSWORK" comment to highlight iffy bits and
questionable assumptions.
This one covers the codepath reached from rerere(), the primary
interface to the subsystem.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explain the internals of rerere as in-code comments, while
sprinkling "NEEDSWORK" comment to highlight iffy bits and
questionable assumptions.
This one covers the "$GIT_DIR/MERGE_RR" file and in-core merge_rr
that are used to keep track of the status of "rerere" session in
progress.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rerere_io_putconflict() wants to use a limited fixed-sized buf[] on
stack repeatedly to formulate a longer string, but its implementation
is doubly confusing:
* When it knows that the whole thing fits in buf[], it wants to
fill early part of buf[] with conflict marker characters,
followed by a LF and a NUL. It miscounts the size of the buffer
by 1 and does not use the last byte of buf[].
* When it needs to show only the early part of a long conflict
marker string (because the whole thing does not fit in buf[]), it
adjusts the number of bytes shown in the current round in a
strange-looking way. It makes sure that this round does not emit
all bytes and leaves at least one byte to the next round, so that
"it all fits" case will pick up the rest and show the terminating
LF. While this is correct, one needs to stop and think for a
while to realize why it is correct without an explanation.
Fix the benign off-by-one, and add comments to explain the
strange-looking size adjustment.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explain the internals of rerere as in-code comments.
This one covers our thin I/O abstraction to read from either
a file or a memory while optionally writing out to a file.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A conflicted index can have multiple stage #1 entries when dealing
with a criss-cross merge and using the "resolve" merge strategy.
Plug the leak by reading only the first one of the same stage
entries.
Strictly speaking, this fix does change the semantics, in that we
used to use the last stage #1 entry as the common ancestor when
doing the plain-vanilla three-way merge, but with the leak fix, we
will use the first stage #1 entry. But it is not a grave backward
compatibility breakage. Either way, we are arbitrarily picking one
of multiple stage #1 entries and using it, ignoring others, and
there is no meaning in the ordering of these stage #1 entries.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
handle_cache() loops 3 times starting from an index entry that is
unmerged, while ignoring an entry for a path that is different from
what we are looking for.
As the index is sorted, once we see a different path, we know we saw
all stages for the path we are interested in. Just loop while we
see the same path and then break, instead of continuing for 3 times.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As the nature of the conflict marker line determines if there should
be a SP and label after it, the caller shouldn't have to pass the
parameter redundantly.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of writing the hash for a conflict, a HT, and the path
with three separate write_in_full() calls, format them into a
single record into a strbuf and write it out in one go.
As a more recent "rerere remaining" codepath abuses the .util field
of the merge_rr data to store a sentinel token, make sure that
codepath does not call into this function (of course, "remaining" is
a read-only operation and currently does not call it).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The MERGE_RR file records a collection of NUL-terminated entries,
each of which consists of
- a hash that identifies the conflict
- a HT
- the pathname
We used to read this piece-by-piece, and worse yet, read the
pathname part a byte at a time into a fixed buffer of size PATH_MAX.
Instead, read a whole entry using strbuf_getwholeline() and parse
out the fields. This way, we issue fewer read(2) calls and more
importantly we do not have to limit the pathname to PATH_MAX.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The merge_rr string list stores the conflict ID (a hexadecimal
string that is used to index into $GIT_DIR/rr-cache) in the .util
field of its elements, and when do_plain_rerere() resolves a
conflict, the field is cleared. Also, when rerere_forget()
recomputes the conflict ID to updates the preimage file, the
conflict ID for the path is updated.
We forgot to free the existing conflict ID when we did these two
operations.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A conflicted index can have multiple stage #1 entries when dealing
with a criss-cross merge and using the "resolve" merge strategy.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When ac49f5ca (rerere "remaining", 2011-02-16) split out a new
helper function check_one_conflict() out of find_conflict()
function, so that the latter will use the returned value from the
new helper to update the loop control variable that is an index into
active_cache[], the new variable incremented the index by one too
many when it found a path with only stage #1 entry at the very end
of active_cache[].
This "strange" return value does not have any effect on the loop
control of two callers of this function, as they all notice that
active_nr+2 is larger than active_nr just like active_nr+1 is, but
nevertheless it puzzles the readers when they are trying to figure
out what the function is trying to do.
In fact, there is no need to do an early return. The code that
follows after skipping the stage #1 entry is fully prepared to
handle a case where the entry is at the very end of active_cache[].
Help future readers from unnecessary confusion by dropping an early
return. We skip the stage #1 entry, and if there are stage #2 and
stage #3 entries for the same path, we diagnose the path as
THREE_STAGED (otherwise we say PUNTED), and then we skip all entries
for the same path.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git rerere forget" in a repository without rerere enabled gave a
cryptic error message; it should be a silent no-op instead.
* jk/rerere-forget-check-enabled:
rerere: exit silently on "forget" when rerere is disabled
If you run "git rerere forget foo" in a repository that does
not have rerere enabled, git hits an internal error:
$ git init -q
$ git rerere forget foo
fatal: BUG: attempt to commit unlocked object
The problem is that setup_rerere() will not actually take
the lock if the rerere system is disabled. We should notice
this and return early. We can return with a success code
here, because we know there is nothing to forget.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git rerere" (invoked internally from many mergy operations) did
not correctly signal errors when told to update the working tree
files and failed to do so for whatever reason.
* jn/rerere-fail-on-auto-update-failure:
rerere: error out on autoupdate failure
"git rerere" (invoked internally from many mergy operations) did
not correctly signal errors when told to update the working tree
files and failed to do so for whatever reason.
* jn/rerere-fail-on-auto-update-failure:
rerere: error out on autoupdate failure
We have been silently tolerating errors by returning early with an
error that the caller ignores since rerere.autoupdate was introduced
in v1.6.0-rc0~120^2 (2008-06-22). So on error (for example if the
index is already locked), rerere can return success silently without
updating the index or with only some items in the index updated.
Better to treat such failures as a fatal error so the operator can
figure out what is wrong and fix it.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the interface declaration for the functions in lockfile.c from
cache.h to a new file, lockfile.h. Add #includes where necessary (and
remove some redundant includes of cache.h by files that already
include builtin.h).
Move the documentation of the lock_file state diagram from lockfile.c
to the new header file.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use `git_config_get_*()` family instead of `git_config()` to take advantage of
the config-set API which provides a cleaner control flow.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* rs/code-cleaning:
fsck: simplify fsck_commit_buffer() by using commit_list_count()
commit: use commit_list_append() instead of duplicating its code
merge: simplify merge_trivial() by using commit_list_append()
use strbuf_addch for adding single characters
use strbuf_addbuf for adding strbufs
This patch activates the DO_MATCH_DIRECTORY code in m_p_i(), which
makes "git diff HEAD submodule/" and "git diff HEAD submodule" produce
the same output. Previously only the version without trailing slash
returns the difference (if any).
That's the effect of new ce_path_match(). dir_path_match() is not
executed by the new tests. And it should not introduce regressions.
Previously if path "dir/" is passed in with pathspec "dir/", they
obviously match. With new dir_path_match(), the path becomes
_directory_ "dir" vs pathspec "dir/", which is not executed by the old
code path in m_p_i(). The new code path is executed and produces the
same result.
The other case is pathspec "dir" and path "dir/" is now turned to
"dir" (with DO_MATCH_DIRECTORY). Still the same result before or after
the patch.
So why change? Because of the next patch about clean.c.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A long time ago, for some reason I was not happy with
match_pathspec(). I created a better version, match_pathspec_depth()
that was suppose to replace match_pathspec()
eventually. match_pathspec() has finally been gone since 6 months
ago. Use the shorter name for match_pathspec_depth().
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git mv A B" when moving a submodule A does "the right thing",
inclusing relocating its working tree and adjusting the paths in
the .gitmodules file.
* jl/submodule-mv: (53 commits)
rm: delete .gitmodules entry of submodules removed from the work tree
mv: update the path entry in .gitmodules for moved submodules
submodule.c: add .gitmodules staging helper functions
mv: move submodules using a gitfile
mv: move submodules together with their work trees
rm: do not set a variable twice without intermediate reading.
t6131 - skip tests if on case-insensitive file system
parse_pathspec: accept :(icase)path syntax
pathspec: support :(glob) syntax
pathspec: make --literal-pathspecs disable pathspec magic
pathspec: support :(literal) syntax for noglob pathspec
kill limit_pathspec_to_literal() as it's only used by parse_pathspec()
parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN
parse_pathspec: make sure the prefix part is wildcard-free
rename field "raw" to "_raw" in struct pathspec
tree-diff: remove the use of pathspec's raw[] in follow-rename codepath
remove match_pathspec() in favor of match_pathspec_depth()
remove init_pathspec() in favor of parse_pathspec()
remove diff_tree_{setup,release}_paths
convert common_prefix() to use struct pathspec
...
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is
- diff-lib.c: setting CE_UPTODATE
- name-hash.c: setting CE_HASHED
- preload-index.c, read-cache.c, unpack-trees.c and
builtin/update-index: obvious
- entry.c: write_entry() may refresh the checked out entry via
fill_stat_cache_info(). This causes "non-const struct cache_entry
*" in builtin/apply.c, builtin/checkout-index.c and
builtin/checkout.c
- builtin/ls-files.c: --with-tree changes stagemask and may set
CE_UPDATE
Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.
So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.
The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:
diff --git a/cache.h b/cache.h
index 430d021..1692891 100644
--- a/cache.h
+++ b/cache.h
@@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
#define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
struct index_state {
- struct cache_entry **cache;
+ const struct cache_entry **cache;
unsigned int version;
unsigned int cache_nr, cache_alloc, cache_changed;
struct string_list *resolve_undo;
will help quickly identify them without bogus warnings.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The loop that fills in the buffers that are later passed to the merge
driver exits early when not all stages of a path are present in the index.
But since the buffer pointers are not initialized in advance, the
subsequent accesses are undefined.
Initialize buffer pointers in advance to avoid undefined behavior later.
That is not sufficient, though, to get correct operation of handle_cache().
The function replays a conflicted merge to extract the part inside the
conflict markers. As written, the loop exits early when a stage is missing.
Consequently, the buffers for later stages that would be present in the
index are not filled in and the merge is replayed with incomplete data.
Fix it by investigating all stages of the given path.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using 'git rerere forget .' after a merge that involved binary files
runs into an infinite loop if the binary file contains a zero byte.
Replace a strchrnul by memchr because the former does not make progress
as soon as the NUL is encountered.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A handful of files and directories we create had tighter than
necessary permission bits when the user wanted to have group
writability (e.g. by setting "umask 002").
* ar/clone-honor-umask-at-top:
add: create ADD_EDIT.patch with mode 0666
rerere: make rr-cache fanout directory honor umask
Restore umasks influence on the permissions of work tree created by clone
This is the last remaining call to mkdir(2) that restricts the permission
bits by passing 0755. Just use the same mkdir_in_gitdir() used to create
the leaf directories.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
git-submodule.sh: separate parens by a space to avoid confusing some shells
Documentation/technical/api-diff.txt: correct name of diff_unmerge()
read_gitfile_gently: use ssize_t to hold read result
remove tests of always-false condition
rerere.c: diagnose a corrupt MERGE_RR when hitting EOF between TAB and '\0'
* jm/maint-misc-fix:
read_gitfile_gently: use ssize_t to hold read result
remove tests of always-false condition
rerere.c: diagnose a corrupt MERGE_RR when hitting EOF between TAB and '\0'
If we reach EOF after the SHA1-then-TAB, yet before the NUL that
terminates each file name, we would fill the file name buffer with \255
bytes resulting from the repeatedly-failing fgetc (returns EOF/-1) and
ultimately complain about "filename too long", because no NUL was
encountered.
Signed-off-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This moves the two features from builtin/rerere.c to a more library-ish
portion of the codebase. No behaviour change.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* load_file() returns a void pointer but is using 0 for the return
value
* builtin/receive-pack.c forgot to include builtin.h
* packet_trace_prefix can be marked static
* ll_merge takes a pointer for its last argument, not an int
* crc32 expects a pointer as the second argument but Z_NULL is defined
to be 0 (see 38f4d13 sparse fix: Using plain integer as NULL pointer,
2006-11-18 for more info)
Signed-off-by: Stephen Boyd <bebarino@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-new-workdir script in contrib/ makes a new work tree by sharing
many subdirectories of the .git directory with the original repository.
When rerere.enabled is set in the original repository, but the user has
not encountered any conflicts yet, the original repository may not yet
have .git/rr-cache directory.
When rerere wants to run in a new work tree created from such a young
original repository, it fails to mkdir(2) .git/rr-cache that is a symlink
to a yet-to-be-created directory.
There are three possible approaches to this:
- A naive solution is not to create a symlink in the git-new-workdir
script to a directory the original does not have (yet). This is not a
solution, as we tend to lazily create subdirectories of .git/, and
having rerere.enabled configuration set is a strong indication that the
user _wants_ to have this lazy creation to happen;
- We could always create .git/rr-cache upon repository creation. This is
tempting but will not help people with existing repositories.
- Detect this case by seeing that mkdir(2) failed with EEXIST, checking
that the path is a symlink, and try running mkdir(2) on the link
target.
This patch solves the issue by doing the third one.
Strictly speaking, this is incomplete. It does not attempt to handle
relative symbolic link that points into the original repository, but this
is good enough to help people who use contrib/workdir/git-new-workdir
script.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After "rerere" resolves conflicts by reusing old resolution, there would
be three kinds of paths with conflict in the index:
* paths that have been resolved in the working tree by rerere;
* paths that need further work whose resolution could be recorded;
* paths that need resolving that rerere won't help.
When the user wants a list of paths that need hand-resolving, output from
"rerere status" does not help, as it shows only the second category, but
the paths in the third category still needs work (rerere only makes sense
for regular files that have both our side and their side, and does not
help other kinds of conflicts, e.g. "we modified, they deleted").
The new subcommand "rerere remaining" can be used to show both. As
opposed to "rerere status", this subcommand also skips printing paths
that have been added to the index, since these paths are already
resolved and are no longer "remaining".
Initial patch provided by Junio. Refactored and modified to skip
resolved paths by Martin. Commit message mostly by Junio.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Martin von Zweigbergk <martin.von.zweigbergk@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>