We clarified how "clean.requireForce" interacts with the "--dry-run"
option in the previous commit, both in the implementation and in the
documentation. Even when "git clean" (without other options) is
required to be used with "--force" (i.e. either clean.requireForce
is unset, or explicitly set to true) to protect end-users from
casual invocation of the command by mistake, "--dry-run" does not
require "--force" to be used, because it is already its own
protection mechanism by being a no-op to the working tree files.
The previous commit, however, missed another clean-up opportunity
around the same area. Just like in the "--dry-run" mode, the
command in the "--interactive" mode does not require "--force",
either. This is because by going interactive and giving the end
user one more chance to confirm, the mode itself is serving as its
own protection mechanism.
Let's take things one step further, and unify the code that defines
interaction between "--force" and these two other options. Just
like we added explanation for the reason why "--dry-run" does not
honor "clean.requireForce", give an explanation for the reason why
"--interactive" makes "clean.requireForce" to be ignored.
Finally, add some tests to show the interaction between "--force"
and "--interactive". We already have tests that show interaction
between "--force" and "--dry-run", but didn't test "--interactive".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
What -n actually does in addition to its documented behavior is
ignoring of configuration variable clean.requireForce, that makes
sense provided -n prevents files removal anyway.
So, first, document this in the manual, and then modify implementation
to make this more explicit in the code.
Improved implementation also stops to share single internal variable
'force' between command-line -f option and configuration variable
clean.requireForce, resulting in more clear logic.
Two error messages with slightly different text depending on if
clean.requireForce was explicitly set or not, are merged into a single
one.
The resulting error message now does not mention -n as well, as it
neither matches intended clean.requireForce usage nor reflects
clarified implementation.
Documentation of clean.requireForce is changed accordingly.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the standard parameterized message for reporting incompatible
options to inform users that they can't use -x and -X together. This
reduces the number of strings to translate and makes the UI slightly
more consistent.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce reliance on a global state in the config reading API.
* gc/config-context:
config: pass source to config_parser_event_fn_t
config: add kvi.path, use it to evaluate includes
config.c: remove config_reader from configsets
config: pass kvi to die_bad_number()
trace2: plumb config kvi
config.c: pass ctx with CLI config
config: pass ctx with config files
config.c: pass ctx in configsets
config: add ctx arg to config_fn_t
urlmatch.h: use config_fn_t type
config: inline git_color_default_config
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).
In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.
Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:
- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed
Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.
The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:
- trace2/tr2_cfg.c:tr2_cfg_set_fl()
This is indirectly called by git_config_set() so that the trace2
machinery can notice the new config values and update its settings
using the tr2 config parsing function, i.e. tr2_cfg_cb().
- builtin/checkout.c:checkout_main()
This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
This might be worth refactoring away in the future, since
git_xmerge_config() can call git_default_config(), which can do much
more than just parsing.
Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git_color_default_config() is a shorthand for calling two other config
callbacks. There are no other non-static functions that do this and it
will complicate our refactoring of config_fn_t so inline it instead.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This also made it clear that several .c files that depended upon path.h
were missing a #include for it; add the missing includes while at it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since this header showed up in some places besides just #include
statements, update/clean-up/remove those other places as well.
Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got
away with violating the rule that all files must start with an include
of git-compat-util.h (or a short-list of alternate headers that happen
to include it first). This change exposed the violation and caused it
to stop building correctly; fix it by having it include
git-compat-util.h first, as per policy.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For the functions defined in read-cache.c, move their declarations from
cache.h to a new header, read-cache-ll.h. Also move some related inline
functions from cache.h to read-cache.h. The purpose of the
read-cache-ll.h/read-cache.h split is that about 70% of the sites don't
need the inline functions and the extra headers they include.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
hash.h depends upon and includes repository.h, due to the definition and
use of the_hash_algo (defined as the_repository->hash_algo). However,
most headers trying to include hash.h are only interested in the layout
of the structs like object_id. Move the parts of hash.h that do not
depend upon repository.h into a new file hash-ll.h (the "low level"
parts of hash.h), and adjust other files to use this new header where
the convenience inline functions aren't needed.
This allows hash.h and object.h to be fairly small, minimal headers. It
also exposes a lot of hidden dependencies on both path.h (which was
brought in by repository.h) and repository.h (which was previously
implicitly brought in by object.h), so also adjust other files to be
more explicit about what they depend upon.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is another step towards letting us remove the include of cache.h in
strbuf.c. It does mean that we also need to add includes of abspath.h
in a number of C files.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Dozens of files made use of gettext functions, without explicitly
including gettext.h. This made it more difficult to find which files
could remove a dependence on cache.h. Make C files explicitly include
gettext.h if they are using it.
However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Leak fixes.
* ab/various-leak-fixes:
push: free_refs() the "local_refs" in set_refspecs()
push: refactor refspec_append_mapped() for subsequent leak-fix
receive-pack: release the linked "struct command *" list
grep API: plug memory leaks by freeing "header_list"
grep.c: refactor free_grep_patterns()
builtin/merge.c: free "&buf" on "Your local changes..." error
builtin/merge.c: use fixed strings, not "strbuf", fix leak
show-branch: free() allocated "head" before return
commit-graph: fix a parse_options_concat() leak
http-backend.c: fix cmd_main() memory leak, refactor reg{exec,free}()
http-backend.c: fix "dir" and "cmd_arg" leaks in cmd_main()
worktree: fix a trivial leak in prune_worktrees()
repack: fix leaks on error with "goto cleanup"
name-rev: don't xstrdup() an already dup'd string
various: add missing clear_pathspec(), fix leaks
clone: use free() instead of UNLEAK()
commit-graph: use free_commit_graph() instead of UNLEAK()
bundle.c: don't leak the "args" in the "struct child_process"
tests: mark tests as passing with SANITIZE=leak
Fix memory leaks resulting from a missing clear_pathspec().
- archive.c: Plug a leak in the "struct archiver_args", and
clear_pathspec() the "pathspec" member that the "parse_pathspec_arg()"
call in this function populates.
- builtin/clean.c: Fix a memory leak that's been with us since
893d839970 (clean: convert to use parse_pathspec, 2013-07-14).
- builtin/reset.c: Add clear_pathspec() calls to cmd_reset(),
including to the codepaths where we'd return early.
- builtin/stash.c: Call clear_pathspec() on the pathspec initialized
in push_stash().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we've removed "git-add--interactive.perl" let's replace
mentions of it with "add-interactive.c". In the case of the "git add"
documentation we were using it as an example filename, so the mention
wasn't wrong, but using a dead file is slightly confusing.
The "borrowed" comment here likewise isn't wrong, but let's mention
the successor file instead. In the case of pathspec.c the implied TODO
item should refer to the current code (and the comment may not even be
current, I didn't check).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply "index-compatibility.pending.cocci" rule to "builtin/*", but
exclude those where we conflict with in-flight changes.
As a result some of them end up using only "the_index", so let's have
them use the more narrow "USE_THE_INDEX_VARIABLE" rather than
"USE_THE_INDEX_COMPATIBILITY_MACROS".
Manual changes not made by coccinelle, that were squashed in:
* Whitespace-wrap argument lists for repo_hold_locked_index(),
repo_read_index_preload() and repo_refresh_and_write_index(), in cases
where the line became too long after the transformation.
* Change "refresh_cache()" to "refresh_index()" in a comment in
"builtin/update-index.c".
* For those whose call was followed by perror("<macro-name>"), change
it to perror("<function-name>"), referring to the new function.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 4aab5b46f4 (Make read-cache.c "the_index" free., 2007-04-01)
we've been undergoing a slow migration away from these macros, but
haven't made much progress since f8adbec9fe (cache.h: flip
NO_THE_INDEX_COMPATIBILITY_MACROS switch, 2019-01-24).
Let's move forward a bit by changing the users of those macros that
are rare enough that we can convert them in one go, and then remove
the compatibility shim.
The only manual change to the C code here is to "cache.h", the rest is
all the result of applying the new "index-compatibility.cocci".
Even though it's a one-off, let's keep the coccinelle rules for
now. We'll extend them in subsequent commits, and this will help
anything that's in-flight or out-of-tree to migrate.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Correct uses of "<label>..." where we really meant to say
"[<label>...]", i.e. the command in question taken an optional set of
"<label>". As the CodingGuidelines notes "[o]ptional parts [should be]
enclosed in square brackets".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix various inconsistencies between command SYNOPSIS and the
corresponding -h output where our translatable labels didn't match
up.
In some cases we need to adjust the prose that follows the SYNOPSIS
accordingly, as it refers back to the changed label.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git update-index", "git checkout-index", and "git clean" are
taught to work better with the sparse checkout feature.
* vd/sparse-clean-etc:
update-index: reduce scope of index expansion in do_reupdate
update-index: integrate with sparse index
update-index: add tests for sparse-checkout compatibility
checkout-index: integrate with sparse index
checkout-index: add --ignore-skip-worktree-bits option
checkout-index: expand sparse checkout compatibility tests
clean: integrate with sparse index
reset: reorder wildcard pathspec conditions
reset: fix validation in sparse index test
Remove full index requirement for `git clean` and test to ensure the index
is not expanded in `git clean`. Add to existing test for `git clean` to
verify cleanup of untracked files in sparse directories is consistent
between sparse index and non-sparse index checkouts.
Signed-off-by: Victoria Dye <vdye@github.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the dir_init() function and replace it with a DIR_INIT
macro. In many cases in the codebase we need to initialize things with
a function for good reasons, e.g. needing to call another function on
initialization. The "dir_init()" function was not one such case, and
could trivially be replaced with a more idiomatic macro initialization
pattern.
The only place where we made use of its use of memset() was in
dir_clear() itself, which resets the contents of an an existing struct
pointer. Let's use the new "memcpy() a 'blank' struct on the stack"
idiom to do that reset.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git clean" and "git ls-files -i" had confusion around working on
or showing ignored paths inside an ignored directory, which has
been corrected.
* en/dir-traversal:
dir: introduce readdir_skip_dot_and_dotdot() helper
dir: update stale description of treat_directory()
dir: traverse into untracked directories if they may have ignored subfiles
dir: avoid unnecessary traversal into ignored directory
t3001, t7300: add testcase showcasing missed directory traversal
t7300: add testcase showing unnecessary traversal into ignored directory
ls-files: error out on -i unless -o or -c are specified
dir: report number of visited directories and paths with trace2
dir: convert trace calls to trace2 equivalents
Many places in the code were doing
while ((d = readdir(dir)) != NULL) {
if (is_dot_or_dotdot(d->d_name))
continue;
...process d...
}
Introduce a readdir_skip_dot_and_dotdot() helper to make that a one-liner:
while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) {
...process d...
}
This helper particularly simplifies checks for empty directories.
Also use this helper in read_cached_dir() so that our statistics are
consistent across platforms. (In other words, read_cached_dir() should
have been using is_dot_or_dotdot() and skipping such entries, but did
not and left it to treat_path() to detect and mark such entries as
path_none.)
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The variable `matches` used to hold the return of a `dir_path_match()`
call that was removed in 95c11ecc73 ("Fix error-prone fill_directory()
API; make it only return matches", 2020-04-01). Now `matches` will
always hold 0, which is the value it's initialized with; and the
condition `matches != MATCHED_EXACTLY` will always evaluate to true. So
let's remove this unnecessary variable.
Interestingly, it seems that `matches != MATCHED_EXACTLY` was already
unnecessary before 95c11ecc73. That's because `remove_directories` is
always set to 1 when we have pathspecs; So, in the condition
`!remove_directories && matches != MATCHED_EXACTLY`, we would either:
- have pathspecs (or have been given `-d`) and ignore `matches` because
`remove_directories` is 1; or
- not have pathspecs (nor `-d`) and end up just checking that
`0 != MATCHED_EXACTLY`, as `matches` would never get reassigned
after its zero initialization (because there is no pathspec to match).
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add and apply a semantic patch for converting code that open-codes
CALLOC_ARRAY to use it instead. It shortens the code and infers the
element size automatically.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The quote_path() function computes a path (relative to its base
directory) and c-quotes the result if necessary. Teach it to take a
flags parameter to allow its behaviour to be enriched later.
No behaviour change intended.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no quote_path_absolute() or anything that causes confusion,
and one of the two large consumers already rename the long name
locally with a preprocessor macro.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The dir structure seemed to have a number of leaks and problems around
it. First I noticed that parent_hashmap and recursive_hashmap were
being leaked (though Peff noticed and submitted fixes before me). Then
I noticed in the previous commit that clear_directory() was only taking
responsibility for a subset of fields within dir_struct, despite the
fact that entries[] and ignored[] we allocated internally to dir.c.
That, of course, resulted in many callers either leaking or haphazardly
trying to free these arrays and their contents.
Digging further, I found that despite the pretty clear documentation
near the top of dir.h that folks were supposed to call clear_directory()
when the user no longer needed the dir_struct, there were four callers
that didn't bother doing that at all. However, two of them clearly
thought about leaks since they had an UNLEAK(dir) directive, which to me
suggests that the method to free the data was too unclear. I suspect
the non-obviousness of the API and its holes led folks to avoid it,
which then snowballed into further problems with the entries[],
ignored[], parent_hashmap, and recursive_hashmap problems.
Rename clear_directory() to dir_clear() to be more in line with other
data structures in git, and introduce a dir_init() to handle the
suggested memsetting of dir_struct to all zeroes. I hope that a name
like "dir_clear()" is more clear, and that the presence of dir_init()
will provide a hint to those looking at the code that they need to look
for either a dir_clear() or a dir_free() and lead them to find
dir_clear().
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The calling convention for the dir API is supposed to end with a call to
clear_directory() to free up no longer needed memory. However,
clear_directory() didn't free dir->entries or dir->ignored. I believe
this was an oversight, but a number of callers noticed memory leaks and
started free'ing these. Unfortunately, they did so somewhat haphazardly
(sometimes freeing the entries in the arrays, and sometimes only
free'ing the arrays themselves). This suggests the callers weren't
trying to make sure any possible memory used might be free'd, but just
the memory they noticed their usecase definitely had allocated.
Fix this mess by moving all the duplicated free'ing logic into
clear_directory(). End by resetting dir to a pristine state so it could
be reused if desired.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 6b1db43109 ("clean: teach clean -d to preserve ignored paths",
2017-05-23) added the following code block (among others) to git-clean:
if (remove_directories)
dir.flags |= DIR_SHOW_IGNORED_TOO | DIR_KEEP_UNTRACKED_CONTENTS;
The reason for these flags is well documented in the commit message, but
isn't obvious just from looking at the code. Add some explanations to
the code to make it clearer.
Further, it appears git-2.26 did not correctly handle this combination
of flags from git-clean. With both these flags and without
DIR_SHOW_IGNORED_TOO_MODE_MATCHING set, git is supposed to recurse into
all untracked AND ignored directories. git-2.26.0 clearly was not doing
that. I don't know the full reasons for that or whether git < 2.27.0
had additional unknown bugs because of that misbehavior, because I don't
feel it's worth digging into. As per the huge changes and craziness
documented in commit 8d92fb2927 ("dir: replace exponential algorithm
with a linear one", 2020-04-01), the old algorithm was a mess and was
thrown out. What I can say is that git-2.27.0 correctly recurses into
untracked AND ignored directories with that combination.
However, in clean's case we don't need to recurse into ignored
directories; that is just a waste of time. Thus, when git-2.27.0
started correctly handling those flags, we got a performance regression
report. Rather than relying on other bugs in fill_directory()'s former
logic to provide the behavior of skipping ignored directories, make use
of the DIR_SHOW_IGNORED_TOO_MODE_MATCHING value specifically added in
commit eec0f7f2b7 ("status: add option to show ignored files
differently", 2017-10-30) for this purpose.
Reported-by: Brian Malehorn <bmalehorn@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I spent a long time trying to figure out how and whether the code worked
with different values of ignore, ignore_only, and remove_directories.
After lots of time setting up lots of testcases, sifting through lots of
print statements, and walking through the debugger, I finally realized
that one piece of code related to how it was all setup was found in
clean.c rather than dir.c. Make a change that would have made it easier
for me to do the extra testing by putting this handling in one spot.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
dir.h documented quite clearly that DIR_SHOW_IGNORED and
DIR_SHOW_IGNORED_TOO are mutually exclusive, with a big comment to this
effect by the definition of both enum values. However, a command like
git clean -fx $DIR
would set both values for dir.flags. I _think_ it happened to work
because:
* As dir.h points out, DIR_KEEP_UNTRACKED_CONTENTS only takes effect
if DIR_SHOW_IGNORED_TOO is set.
* As coded, I believe DIR_SHOW_IGNORED would just happen to take
precedence over DIR_SHOW_IGNORED_TOO in the code as currently
constructed.
Which is a long way of saying "we just got lucky".
Fix clean.c to avoid setting these mutually exclusive values at the same
time, and add a check to dir.c that will throw a BUG() to prevent anyone
else from making this mistake.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The directory traversal code had redundant recursive calls which
made its performance characteristics exponential with respect to
the depth of the tree, which was corrected.
* en/fill-directory-exponential:
completion: fix 'git add' on paths under an untracked directory
Fix error-prone fill_directory() API; make it only return matches
dir: replace double pathspec matching with single in treat_directory()
dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory()
dir: replace exponential algorithm with a linear one
dir: refactor treat_directory to clarify control flow
dir: fix confusion based on variable tense
dir: fix broken comment
dir: consolidate treat_path() and treat_one_path()
dir: fix simple typo in comment
t3000: add more testcases testing a variety of ls-files issues
t7063: more thorough status checking
In the codebase, there are many options which use OPTION_CALLBACK in a
plain ol' struct definition. However, we have the OPT_CALLBACK and
OPT_CALLBACK_F macros which are meant to abstract these plain struct
definitions away. These macros are useful as they semantically signal to
developers that these are just normal callback option with nothing fancy
happening.
Replace plain struct definitions of OPTION_CALLBACK with OPT_CALLBACK or
OPT_CALLBACK_F where applicable. The heavy lifting was done using the
following (disgusting) shell script:
#!/bin/sh
do_replacement () {
tr '\n' '\r' |
sed -e 's/{\s*OPTION_CALLBACK,\s*\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\s*0,\(\s*[^[:space:]}]*\)\s*}/OPT_CALLBACK(\1,\2,\3,\4,\5,\6)/g' |
sed -e 's/{\s*OPTION_CALLBACK,\s*\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\(\s*[^[:space:]}]*\)\s*}/OPT_CALLBACK_F(\1,\2,\3,\4,\5,\6,\7)/g' |
tr '\r' '\n'
}
for f in $(git ls-files \*.c)
do
do_replacement <"$f" >"$f.tmp"
mv "$f.tmp" "$f"
done
The result was manually inspected and then reformatted to match the
style of the surrounding code. Finally, using
`git grep OPTION_CALLBACK \*.c`, leftover results which were not handled
by the script were manually transformed.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are quite a few code locations (e.g. `git clean --interactive`)
where Git asks the user for an answer. In preparation for fixing a bug
shared by all of them, and also to DRY up the code, let's refactor it.
Please note that most of these callers trimmed white-space both at the
beginning and at the end of the answer, instead of trimming only the
end (as the caller in `add-patch.c` does).
Therefore, technically speaking, we change behavior in this patch. At
the same time, it can be argued that this is actually a bug fix.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Traditionally, the expected calling convention for the dir.c API was:
fill_directory(&dir, ..., pathspec)
foreach entry in dir->entries:
if (dir_path_match(entry, pathspec))
process_or_display(entry)
This may have made sense once upon a time, because the fill_directory() call
could use cheap checks to avoid doing full pathspec matching, and an external
caller may have wanted to do other post-processing of the results anyway.
However:
* this structure makes it easy for users of the API to get it wrong
* this structure actually makes it harder to understand
fill_directory() and the functions it uses internally. It has
tripped me up several times while trying to fix bugs and
restructure things.
* relying on post-filtering was already found to produce wrong
results; pathspec matching had to be added internally for multiple
cases in order to get the right results (see commits 404ebceda0
(dir: also check directories for matching pathspecs, 2019-09-17)
and 89a1f4aaf7 (dir: if our pathspec might match files under a
dir, recurse into it, 2019-09-17))
* it's bad for performance: fill_directory() already has to do lots
of checks and knows the subset of cases where it still needs to do
more checks. Forcing external callers to do full pathspec
matching means they must re-check _every_ path.
So, add the pathspec matching within the fill_directory() internals, and
remove it from external callers.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git clean" fixes.
* en/clean-nested-with-ignored:
dir: special case check for the possibility that pathspec is NULL
clean: fix theoretical path corruption
clean: rewrap overly long line
clean: avoid removing untracked files in a nested git repository
clean: disambiguate the definition of -d
git-clean.txt: do not claim we will delete files with -n/--dry-run
dir: add commentary explaining match_pathspec_item's return value
dir: if our pathspec might match files under a dir, recurse into it
dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case
dir: also check directories for matching pathspecs
dir: fix off-by-one error in match_pathspec_item
dir: fix typo in comment
t7300: add testcases showing failure to clean specified pathspecs
The internal code originally invented for ".gitignore" processing
got reshuffled and renamed to make it less tied to "excluding" and
stress more that it is about "matching", as it has been reused for
things like sparse checkout specification that want to check if a
path is "included".
* ds/include-exclude:
unpack-trees: rename 'is_excluded_from_list()'
treewide: rename 'exclude' methods to 'pattern'
treewide: rename 'EXCL_FLAG_' to 'PATTERN_FLAG_'
treewide: rename 'struct exclude_list' to 'struct pattern_list'
treewide: rename 'struct exclude' to 'struct path_pattern'
cmd_clean() had the following code structure:
struct strbuf abs_path = STRBUF_INIT;
for_each_string_list_item(item, &del_list) {
strbuf_addstr(&abs_path, prefix);
strbuf_addstr(&abs_path, item->string);
PROCESS(&abs_path);
strbuf_reset(&abs_path);
}
where I've elided a bunch of unnecessary details and PROCESS(&abs_path)
represents a big chunk of code rather than an actual function call. One
piece of PROCESS was:
if (lstat(abs_path.buf, &st))
continue;
which would cause the strbuf_reset() to be missed -- meaning that the
next path to be handled would have two paths concatenated. This path
used to use die_errno() instead of continue prior to commit 396049e5fb
("git-clean: refactor git-clean into two phases", 2013-06-25), but my
understanding of how correct_untracked_entries() works is that it will
prevent both dir/ and dir/file from being in the list to clean so this
should be dead code and the die_errno() should be safe. But I hesitate
to remove it since I am not certain.
However, we can fix both this bug and possible similar future bugs by
simply moving the strbuf_reset(&abs_path) to the beginning of the loop.
It'll result in N calls to strbuf_reset() instead of N-1, but that's a
small price to pay to avoid sneaky bugs like this.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Users expect files in a nested git repository to be left alone unless
sufficiently forced (with two -f's). Unfortunately, in certain
circumstances, git would delete both tracked (and possibly dirty) files
and untracked files within a nested repository. To explain how this
happens, let's contrast a couple cases. First, take the following
example setup (which assumes we are already within a git repo):
git init nested
cd nested
>tracked
git add tracked
git commit -m init
>untracked
cd ..
In this setup, everything works as expected; running 'git clean -fd'
will result in fill_directory() returning the following paths:
nested/
nested/tracked
nested/untracked
and then correct_untracked_entries() would notice this can be compressed
to
nested/
and then since "nested/" is a directory, we would call
remove_dirs("nested/", ...), which would
check is_nonbare_repository_dir() and then decide to skip it.
However, if someone also creates an ignored file:
>nested/ignored
then running 'git clean -fd' would result in fill_directory() returning
the same paths:
nested/
nested/tracked
nested/untracked
but correct_untracked_entries() will notice that we had ignored entries
under nested/ and thus simplify this list to
nested/tracked
nested/untracked
Since these are not directories, we do not call remove_dirs() which was
the only place that had the is_nonbare_repository_dir() safety check --
resulting in us deleting both the untracked file and the tracked (and
possibly dirty) file.
One possible fix for this issue would be walking the parent directories
of each path and checking if they represent nonbare repositories, but
that would be wasteful. Even if we added caching of some sort, it's
still a waste because we should have been able to check that "nested/"
represented a nonbare repository before even descending into it in the
first place. Add a DIR_SKIP_NESTED_GIT flag to dir_struct.flags and use
it to prevent fill_directory() and friends from descending into nested
git repos.
With this change, we also modify two regression tests added in commit
91479b9c72 ("t7300: add tests to document behavior of clean and nested
git", 2015-06-15). That commit, nor its series, nor the six previous
iterations of that series on the mailing list discussed why those tests
coded the expectation they did. In fact, it appears their purpose was
simply to test _existing_ behavior to make sure that the performance
changes didn't change the behavior. However, these two tests directly
contradicted the manpage's claims that two -f's were required to delete
files/directories under a nested git repository. While one could argue
that the user gave an explicit path which matched files/directories that
were within a nested repository, there's a slippery slope that becomes
very difficult for users to understand once you go down that route (e.g.
what if they specified "git clean -f -d '*.c'"?) It would also be hard
to explain what the exact behavior was; avoid such problems by making it
really simple.
Also, clean up some grammar errors describing this functionality in the
git-clean manpage.
Finally, there are still a couple bugs with -ffd not cleaning out enough
(e.g. missing the nested .git) and with -ffdX possibly cleaning out the
wrong files (paying attention to outer .gitignore instead of inner).
This patch does not address these cases at all (and does not change the
behavior relative to those flags), it only fixes the handling when given
a single -f. See
https://public-inbox.org/git/20190905212043.GC32087@szeder.dev/ for more
discussion of the -ffd[X?] bugs.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The -d flag pre-dated git-clean's ability to have paths specified. As
such, the default for git-clean was to only remove untracked files in
the current directory, and -d existed to allow it to recurse into
subdirectories.
The interaction of paths and the -d option appears to not have been
carefully considered, as evidenced by numerous bugs and a dearth of
tests covering such pairings in the testsuite. The definition turns out
to be important, so let's look at some of the various ways one could
interpret the -d option:
A) Without -d, only look in subdirectories which contain tracked
files under them; with -d, also look in subdirectories which
are untracked for files to clean.
B) Without specified paths from the user for us to delete, we need to
have some kind of default, so...without -d, only look in
subdirectories which contain tracked files under them; with -d,
also look in subdirectories which are untracked for files to clean.
The important distinction here is that choice B says that the presence
or absence of '-d' is irrelevant if paths are specified. The logic
behind option B is that if a user explicitly asked us to clean a
specified pathspec, then we should clean anything that matches that
pathspec. Some examples may clarify. Should
git clean -f untracked_dir/file
remove untracked_dir/file or not? It seems crazy not to, but a strict
reading of option A says it shouldn't be removed. How about
git clean -f untracked_dir/file1 tracked_dir/file2
or
git clean -f untracked_dir_1/file1 untracked_dir_2/file2
? Should it remove either or both of these files? Should it require
multiple runs to remove both the files listed? (If this sounds like a
crazy question to even ask, see the commit message of "t7300: Add some
testcases showing failure to clean specified pathspecs" added earlier in
this patch series.) What if -ffd were used instead of -f -- should that
allow these to be removed? Should it take multiple invocations with
-ffd? What if a glob (such as '*tracked*') were used instead of
spelling out the directory names? What if the filenames involved globs,
such as
git clean -f '*.o'
or
git clean -f '*/*.o'
?
The current documentation actually suggests a definition that is
slightly different than choice A, and the implementation prior to this
series provided something radically different than either choices A or
B. (The implementation, though, was clearly just buggy). There may be
other choices as well. However, for almost any given choice of
definition for -d that I can think of, some of the examples above will
appear buggy to the user. The only case that doesn't have negative
surprises is choice B: treat a user-specified path as a request to clean
all untracked files which match that path specification, including
recursing into any untracked directories.
Change the documentation and basic implementation to use this
definition.
There were two regression tests that indirectly depended on the current
implementation, but neither was about subdirectory handling. These two
tests were introduced in commit 5b7570cfb4 ("git-clean: add tests for
relative path", 2008-03-07) which was solely created to add coverage for
the changes in commit fb328947c8e ("git-clean: correct printing relative
path", 2008-03-07). Both tests specified a directory that happened to
have an untracked subdirectory, but both were only checking that the
resulting printout of a file that was removed was shown with a relative
path. Update these tests appropriately.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list' makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!
It would be clearer to rename this entire library as a "pattern
matching" library, and the callers apply exclusion/inclusion
logic accordingly based on their needs.
This commit renames several methods defined in dir.h to make
more sense with the renamed 'struct exclude_list' to 'struct
pattern_list' and 'struct exclude' to 'struct path_pattern':
* last_exclude_matching() -> last_matching_pattern()
* parse_exclude() -> parse_path_pattern()
In addition, the word 'exclude' was replaced with 'pattern'
in the methods below:
* add_exclude_list()
* add_excludes_from_file_to_list()
* add_excludes_from_file()
* add_excludes_from_blob_to_list()
* add_exclude()
* clear_exclude_list()
A few methods with the word "exclude" remain. These will
be handled seperately. In particular, the method
"is_excluded()" is concretely about the .gitignore file
relative to a specific directory. This is the important
boundary between library and consumer: is_excluded() cares
about .gitignore, but is_excluded() calls
last_matching_pattern() to make that decision.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list' makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!
It would be clearer to rename this entire library as a "pattern
matching" library, and the callers apply exclusion/inclusion
logic accordingly based on their needs.
This commit renames 'struct exclude_list' to 'struct pattern_list'
and renames several variables called 'el' to 'pl'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When `lstat()` failed, `git clean` would abort without an error
message, leaving the user quite puzzled.
In particular on Windows, where the default maximum path length is
quite small (yet there are ways to circumvent that limit in many
cases), it is very important that users be given an indication why
their command failed because of too long paths when it did.
This test case makes sure that a warning is issued that would have
helped the user who reported this issue:
https://github.com/git-for-windows/git/issues/521
Note that we temporarily set `core.longpaths = false` in the regression
test; this ensures forward-compatibility with the `core.longpaths`
feature that has not yet been upstreamed from Git for Windows.
Helped-by: René Scharfe <l.s.r@web.de>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>