The "--exit-code" option of "git diff" command learned to work with
the "--ext-diff" option.
* rs/external-diff-with-exit-code:
diff: fix --exit-code with external diff
diff: report unmerged paths as changes in run_diff_cmd()
You can ask the diff machinery to let the exit code indicate whether
there are changes, e.g. with --exit-code. It as two ways to calculate
that bit: The quick one assumes blobs with different hashes have
different content, and the more elaborate way actually compares the
contents, possibly applying transformations like ignoring whitespace.
Always use the slower path by setting the flag diff_from_contents,
because any of the files could have an external diff driver set via an
attribute, which might consider binary differences irrelevant, like e.g.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
You can ask the diff machinery to let the exit code indicate whether
there are changes, e.g. with --quiet. It as two ways to calculate that
bit: The quick one assumes blobs with different hashes have different
content, and the more elaborate way actually compares the contents,
possibly applying transformations like ignoring whitespace.
The quick way considers an unmerged file to be a change and reports
exit code 1, which makes sense.
The slower path uses the struct diff_options member found_changes to
indicate whether the blobs differ even with the transformations applied.
It's not set for unmerged files, though, resulting in exit code 0.
Set found_changes in run_diff_cmd() for unmerged files, for a consistent
exit code of 1 if there's an unmerged file, regardless of whether
whitespace is ignored.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow the default prefixes "a/" and "b/" to be tweaked by the
diff.srcPrefix and diff.dstPrefix configuration variables.
Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some l10n translators translated the parameters "files", "param1" and
"param2" in the following message:
"synonym for --dirstat=files,param1,param2..."
Translating "param1" and "param2" is OK, but changing the parameter
"files" is wrong. The parameters that are not meant to be used verbatim
should be marked as placeholders, but the verbatim parameter not marked
as a placeholder should be left as is.
This change is a complement for commit 51e846e673 (doc: enforce
placeholders in documentation, 2023-12-25).
With the help of Jean-Noël,some parameter combinations in one
placeholder (e.g. "<param1,param2>...") are splited into seperate
placeholders.
Helped-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --no-index file1 file2" segfaulted while invoking the
external diff driver, which has been corrected.
* jk/diff-external-with-no-index:
diff: handle NULL meta-info when spawning external diff
Running this:
$ touch foo bar
$ chmod +x foo
$ git -c diff.external=echo diff --ext-diff --no-index foo bar
results in a segfault. The issue is that run_diff_cmd() passes a NULL
"xfrm_msg" variable to run_external_diff(), which feeds it to
strvec_push(), causing the segfault. The bug dates back to 82fbf269b9
(run_external_diff: use an argv_array for the command line, 2014-04-19),
though it mostly only ever worked accidentally. Before then, we just
stuck the NULL pointer into a "const char **" array, so our NULL ended
up acting as an extra end-of-argv sentinel (which was OK, because it was
the last thing in the array).
Curiously, though, this is only a problem with --no-index. We set up
xfrm_msg by calling fill_metainfo(). This result may be empty, or may
have text like "index 1234..5678\n", "rename from foo\nrename from
bar\n", etc. In run_external_diff(), we only look at xfrm_msg if the
"other" variable is not NULL. That variable is set when the paths of the
two sides of the diff pair aren't the same (in which case the
destination path becomes "other"). So normally it would kick in only for
a rename, in which case xfrm_msg should not be NULL (it would have the
rename information in it).
But with a "--no-index" of two blobs, we of course have two different
pathnames, and thus end up with a non-NULL "other" filename (which is
always just a repeat of the file2-name), but possibly a NULL xfrm_msg.
So how to fix it? I have a feeling that --no-index always passing
"other" to the external diff command is probably a bug. There was no
rename, and the name is always redundant with existing information we
pass (and this may even cause us to pass a useless "xfrm_msg" that
contains an "index 1234..5678" line). So one option would be to change
that behavior. We don't seem to have ever documented the "other" or
"xfrm_msg" parameters for external diffs.
But I'm not sure what fallout we might have from changing that behavior
now. So this patch takes the less-risky option, and simply teaches
run_external_diff() to avoid passing xfrm_msg when it's NULL. That makes
it agnostic to whether "other" and "xfrm_msg" always come as a pair. It
fixes the segfault now, and if we want to change the --no-index "other"
behavior on top, it will handle that, too.
Reported-by: Wilfred Hughes <me@wilfred.me.uk>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove unused header "#include".
* en/header-cleanup:
treewide: remove unnecessary includes in source files
treewide: add direct includes currently only pulled in transitively
trace2/tr2_tls.h: remove unnecessary include
submodule-config.h: remove unnecessary include
pkt-line.h: remove unnecessary include
line-log.h: remove unnecessary include
http.h: remove unnecessary include
fsmonitor--daemon.h: remove unnecessary includes
blame.h: remove unnecessary includes
archive.h: remove unnecessary include
treewide: remove unnecessary includes in source files
treewide: remove unnecessary includes from header files
Each of these were checked with
gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).
...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file. These cases were:
* builtin/credential-cache.c
* builtin/pull.c
* builtin/send-pack.c
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The config callbacks for a few diff.* variables simply return -1 when we
encounter an error. The message you get mentions the offending location,
like:
fatal: bad config variable 'diff.algorithm' in file '.git/config' at line 7
but is vague about "bad" (as it must be, since the message comes from
the generic config code). Most callbacks add their own messages here, so
let's do the same. E.g.:
error: unknown value for config 'diff.algorithm': foo
fatal: bad config variable 'diff.algorithm' in file '.git/config' at line 7
I've written the string in a way that should be reusable for
translators, and matches another similar message in transport.c (there
doesn't yet seem to be a popular generic message to reuse here, so
hopefully this will get the ball rolling).
Note that in the case of diff.algorithm, our parse_algorithm_value()
helper does detect a NULL value string. But it's still worth detecting
it ourselves here, since we can give a more specific error message (and
which is the usual one for unexpected implicit-bool values).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the config parser sees an "implicit" bool like:
[core]
someVariable
it passes NULL to the config callback. Any callback code which expects a
string must check for NULL. This usually happens via helpers like
git_config_string(), etc, but some custom code forgets to do so and will
segfault.
These are all fairly vanilla cases where the solution is just the usual
pattern of:
if (!value)
return config_error_nonbool(var);
though note that in a few cases we have to split initializers like:
int some_var = initializer();
into:
int some_var;
if (!value)
return config_error_nonbool(var);
some_var = initializer();
There are still some broken instances after this patch, which I'll
address on their own in individual patches after this one.
Reported-by: Carlos Andrés Ramírez Cataño <antaigroupltda@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --no-such-option" and other corner cases around the exit
status of the "diff" command has been corrected.
* jk/diff-result-code-cleanup:
diff: drop useless "status" parameter from diff_result_code()
diff: drop useless return values in git-diff helpers
diff: drop useless return from run_diff_{files,index} functions
diff: die when failing to read index in git-diff builtin
diff: show usage for unknown builtin_diff_files() options
diff-files: avoid negative exit value
diff: spell DIFF_INDEX_CACHED out when calling run_diff_index()
"git diff -w --exit-code" with various options did not work
correctly, which is being addressed.
* jc/diff-exit-code-with-w-fixes:
diff: the -w option breaks --exit-code for --raw and other output modes
t4040: remove test that succeeded for a wrong reason
diff: teach "--stat -w --exit-code" to notice differences
diff: mode-only change should be noticed by "--patch -w --exit-code"
diff: move the fallback "--exit-code" code down
Extract the commonly used initialization of the --stat-width=<width>,
--stat-name-width=<width> and --stat-graph-with=<width> parameters to their
internal default values into a helper function, to avoid repeating the same
initialization code in a few places.
Add a couple of tests to additionally cover existing configuration options
diff.statNameWidth=<width> and diff.statGraphWidth=<width> when used by
git-merge to generate --stat outputs. This closes the gap that existed
previously in the --stat tests, and reduces the chances for having any
regressions introduced by this commit.
While there, perform a small bunch of minor wording tweaks in the improved
unit test, to improve its test-level consistency a bit.
Signed-off-by: Dragan Simic <dsimic@manjaro.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add new configuration option diff.statNameWidth=<width> that is equivalent
to the command-line option --stat-name-width=<width>, but it is ignored
by format-patch. This follows the logic established by the already
existing configuration option diff.statGraphWidth=<width>.
Limiting the widths of names and graphs in the --stat output makes sense
for interactive work on wide terminals with many columns, hence the support
for these configuration options. They don't affect format-patch because
it already adheres to the traditional 80-column standard.
Update the documentation and add more tests to cover new configuration
option diff.statNameWidth=<width>. While there, perform a few minor code
and whitespace cleanups here and there, as spotted.
Signed-off-by: Dragan Simic <dsimic@manjaro.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --no-such-option" and other corner cases around the exit
status of the "diff" command has been corrected.
* jk/diff-result-code-cleanup:
diff: drop useless "status" parameter from diff_result_code()
diff: drop useless return values in git-diff helpers
diff: drop useless return from run_diff_{files,index} functions
diff: die when failing to read index in git-diff builtin
diff: show usage for unknown builtin_diff_files() options
diff-files: avoid negative exit value
diff: spell DIFF_INDEX_CACHED out when calling run_diff_index()
"git diff -w --exit-code" with various options did not work
correctly, which is being addressed.
* jc/diff-exit-code-with-w-fixes:
diff: the -w option breaks --exit-code for --raw and other output modes
t4040: remove test that succeeded for a wrong reason
diff: teach "--stat -w --exit-code" to notice differences
diff: mode-only change should be noticed by "--patch -w --exit-code"
diff: move the fallback "--exit-code" code down
The output from "--raw", "--name-status", and "--name-only" modes in
"git diff" does depend on and does not reflect how certain different
contents are considered equal, unlike "--patch" and "--stat" output
modes do, when used with options like "-w" (another way of thinking
about it is that it is not like we recompute the hash of the blob
after removing all whitespaces to show "git diff --raw -w" output).
But the fact that "--raw" and friends ignore "-w" is not a good
excuse for "diff --raw -w --exit-code" to also ignore the request to
report the differences with its exit status. When run without "-w",
"git diff --exit-code --raw" does report with its exit status the
differences as requested, and we should do the same when run with
"-w", too.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many programs use diff_result_code() to get a user-visible program exit
code from a diff result (e.g., checking opts.found_changes if
--exit-code was requested).
This function also takes a "status" parameter, which seems at first
glance that it could be used to propagate an error encountered when
computing the diff. But it doesn't work that way:
- negative values are passed through as-is, but are not appropriate as
program exit codes
- when --exit-code or --check is in effect, we _ignore_ the passed-in
status completely. So a failed diff which did not have a chance to
set opts.found_changes would erroneously report "success, no
changes" instead of propagating the error.
After recent cleanups, neither of these bugs is possible to trigger, as
every caller just passes in "0". So rather than fixing them, we can
simply drop the useless parameter instead.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When options like "-w" is used while "--exit-code" option is in
effect, instead of the usual "do we have any filepair whose preimage
and postimage have different <mode,object>?" check, we need to compare
the contents of the blobs, taking into account that certain changes
are considered no-op.
With the previous step, we taught "--patch" codepath to set the
.found_changes bit correctly, even for a change that only affects
the mode and not object. The "--stat" codepath, however, did not
set the .found_changes bit at all. This lead to
$ git diff --stat -w --exit-code
for a change that does have an output to exit with status 0.
Set the bit by inspecting the list of paths the diffstat output is
given for (a mode-only change will still appear as a "0-line added
0-line deleted" change) to fix it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The codepath to notice the content-level changes, taking certain
no-op changes like "ignore whitespace" into account, forgot that
a mode-only change is still a change. This resulted in
$ git diff --patch --exit-code -w
to exit with status 0 even when there is such a mode-only change,
breaking both "--patch" and "--quiet" output formats.
Teach the builtin_diff() codepath that creation and deletion as well
as mode changes are all interesting changes.
Note that the test specifically checks removal of an empty file,
because if there is anything in the preimage (i.e. the removed file
is not empty), the removal would still trigger textual patch output
and the codepath for that does update .found_changes bit to report
that it found an interesting change. We need to make sure that the
.found_changes bit is set even without triggering textual patch
output.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "--exit-code" is asked and the code cannot just answer by
comparing the object names on both sides but needs to inspect and
compare the contents, there are two ways that the result is found
out.
Some output modes, like "--stat" and "--patch", inherently have to
inspect the contents in order to show the differences in the way
they do. The codepaths for these modes set the .found_changes bit
as they compute what to show.
However, other output modes do not need to inspect the contents to
show the differences in the way they do. The most notable example
is "--quiet", which does not need to compute any output to show.
When they are asked to report "--exit-code", they run the codepaths
for the "--patch" output with their output redirected to "/dev/null",
only to set the .found_changes bit.
Currently, this fallback invocation of "--patch" output is done
after the "--stat" output format and its friends and before the
"--patch" and internal callback logic. Move it to the end of
the sequence to clarify the fallback status of this code block.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Further shuffling of declarations across header files to streamline
file dependencies.
* cw/compat-util-header-cleanup:
git-compat-util: move alloc macros to git-compat-util.h
treewide: remove unnecessary includes for wrapper.h
kwset: move translation table from ctype
sane-ctype.h: create header for sane-ctype macros
git-compat-util: move wrapper.c funcs to its header
git-compat-util: move strbuf.c funcs to its header
Reduce reliance on a global state in the config reading API.
* gc/config-context:
config: pass source to config_parser_event_fn_t
config: add kvi.path, use it to evaluate includes
config.c: remove config_reader from configsets
config: pass kvi to die_bad_number()
trace2: plumb config kvi
config.c: pass ctx with CLI config
config: pass ctx with config files
config.c: pass ctx in configsets
config: add ctx arg to config_fn_t
urlmatch.h: use config_fn_t type
config: inline git_color_default_config
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for
dynamic array allocation. Moving these macros to git-compat-util.h with
the other alloc macros focuses alloc.[ch] to allocation for Git objects
and additionally allows us to remove inclusions to alloc.h from files
that solely used the above macros.
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Plumb "struct key_value_info" through all code paths that end in
die_bad_number(), which lets us remove the helper functions that read
analogous values from "struct config_reader". As a result, nothing reads
config_reader.config_kvi any more, so remove that too.
In config.c, this requires changing the signature of
git_configset_get_value() to 'return' "kvi" in an out parameter so that
git_configset_get_<type>() can pass it to git_config_<type>(). Only
numeric types will use "kvi", so for non-numeric types (e.g.
git_configset_get_string()), pass NULL to indicate that the out
parameter isn't needed.
Outside of config.c, config callbacks now need to pass "ctx->kvi" to any
of the git_config_<type>() functions that parse a config string into a
number type. Included is a .cocci patch to make that refactor.
The only exceptional case is builtin/config.c, where git_config_<type>()
is called outside of a config callback (namely, on user-provided input),
so config source information has never been available. In this case,
die_bad_number() defaults to a generic, but perfectly descriptive
message. Let's provide a safe, non-NULL for "kvi" anyway, but make sure
not to change the message.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).
In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.
Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:
- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed
Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.
The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:
- trace2/tr2_cfg.c:tr2_cfg_set_fl()
This is indirectly called by git_config_set() so that the trace2
machinery can notice the new config values and update its settings
using the tr2 config parsing function, i.e. tr2_cfg_cb().
- builtin/checkout.c:checkout_main()
This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
This might be worth refactoring away in the future, since
git_xmerge_config() can call git_default_config(), which can do much
more than just parsing.
Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.
Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a comment on top of add_diff_options, where common diff options are
listed, mentioning __git_diff_common_options in the completion script,
in the hope that contributors update it when they add new diff flags.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The vast majority of files including object-store.h did not need dir.h
nor khash.h. Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.
After this patch:
$ git grep -h include..object-store | sort | uniq -c
2 #include "object-store.h"
129 #include "object-store-ll.h"
Diff best viewed with `--color-moved`.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A long term (but rather minor) pet-peeve of mine was the name
ll-merge.[ch]. I thought it made it harder to realize what stuff was
related to merging when I was working on the merge machinery and trying
to improve it.
Further, back in d1cbe1e6d8 ("hash-ll.h: split out of hash.h to remove
dependency on repository.h", 2023-04-22), we have split the portions of
hash.h that do not depend upon repository.h into a "hash-ll.h" (due to
the recommendation to use "ll" for "low-level" in its name[1], but which
I used as a suffix precisely because of my distaste for "ll-merge").
When we discussed adding additional "*-ll.h" files, a request was made
that we use "ll" consistently as either a prefix or a suffix. Since it
is already in use as both a prefix and a suffix, the only way to do so
is to rename some files.
Besides my distaste for the ll-merge.[ch] name, let me also note that
the files
ll-fsmonitor.h, ll-hash.h, ll-merge.h, ll-object-store.h, ll-read-cache.h
would have essentially nothing to do with each other and make no sense
to group. But giving them the common "ll-" prefix would group them. Using
"-ll" as a suffix thus seems just much more logical to me. Rename
ll-merge.[ch] to merge-ll.[ch] to achieve this consistency, and to
ensure we get a more logical grouping of files.
[1] https://lore.kernel.org/git/kl6lsfcu1g8w.fsf@chooglen-macbookpro.roam.corp.google.com/
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since this header showed up in some places besides just #include
statements, update/clean-up/remove those other places as well.
Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got
away with violating the rule that all files must start with an include
of git-compat-util.h (or a short-list of alternate headers that happen
to include it first). This change exposed the violation and caused it
to stop building correctly; fix it by having it include
git-compat-util.h first, as per policy.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For the functions defined in read-cache.c, move their declarations from
cache.h to a new header, read-cache-ll.h. Also move some related inline
functions from cache.h to read-cache.h. The purpose of the
read-cache-ll.h/read-cache.h split is that about 70% of the sites don't
need the inline functions and the extra headers they include.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git [-c log.follow=true] log [--follow] ':(glob)f**'" used to barf.
* jk/log-follow-with-non-literal-pathspec:
diff: detect pathspec magic not supported by --follow
diff: factor out --follow pathspec check
pathspec: factor out magic-to-name function
The "-s" (silent, squelch) option of the "diff" family of commands
did not interact with other options that specify the output format
well. This has been cleaned up so that it will clear all the
formatting options given before.
* jc/diff-s-with-other-options:
diff: fix interaction between the "-s" option and other options
The --follow code doesn't handle most forms of pathspec magic. We check
that no unexpected ones have made it to try_to_follow_renames() with a
runtime GUARD_PATHSPEC() check, which gives behavior like this:
$ git log --follow ':(icase)makefile' >/dev/null
BUG: tree-diff.c:596: unsupported magic 10
Aborted
The same is true of ":(glob)", ":(attr)", and so on. It's good that we
notice the problem rather than continuing and producing a wrong answer.
But there are two non-ideal things:
1. The idea of GUARD_PATHSPEC() is to catch programming errors where
low-level code gets unexpected pathspecs. We'd usually try to catch
unsupported pathspecs by passing a magic_mask to parse_pathspec(),
which would give the user a much better message like:
pathspec magic not supported by this command: 'icase'
That doesn't happen here because git-log usually _does_ support
all types of pathspec magic, and so it passes "0" for the mask
(this call actually happens in setup_revisions()). It needs to
distinguish the normal case from the "--follow" one but currently
doesn't.
2. In addition to --follow, we have the log.follow config option. When
that is set, we try to turn on --follow mode only when there is a
single pathspec (since --follow doesn't handle anything else). But
really, that ought to be expanded to "use --follow when the
pathspec supports it". Otherwise, we'd complain any time you use an
exotic pathspec:
$ git config log.follow true
$ git log ':(icase)makefile' >/dev/null
BUG: tree-diff.c:596: unsupported magic 10
Aborted
We should instead just avoid enabling follow mode if it's not
supported by this particular invocation.
This patch expands our diff_check_follow_pathspec() function to cover
pathspec magic, solving both problems.
A few final notes:
- we could also solve (1) by passing the appropriate mask to
parse_pathspec(). But that's not great for two reasons. One is that
the error message is less precise. It says "magic not supported by
this command", but really it is not the command, but rather the
--follow option which is the problem. The second is that it always
calls die(). But for our log.follow code, we want to speculatively
ask "is this pathspec OK?" and just get a boolean result.
- This is obviously the right thing to do for ':(icase)' and most
other magic options. But ':(glob)' is a bit odd here. The --follow
code doesn't support wildcards, but we allow them anyway. From
try_to_follow_renames():
#if 0
/*
* We should reject wildcards as well. Unfortunately we
* haven't got a reliable way to detect that 'foo\*bar' in
* fact has no wildcards. nowildcard_len is merely a hint for
* optimization. Let it slip for now until wildmatch is taught
* about dry-run mode and returns wildcard info.
*/
if (opt->pathspec.has_wildcard)
BUG("wildcards are not supported");
#endif
So something like "git log --follow 'Make*'" is already doing the
wrong thing, since ":(glob)" behavior is already the default (it is
used only to countermand an earlier --noglob-pathspecs).
So we _could_ loosen the guard to allow :(glob), since it just
behaves the same as pathspecs do by default. But it seems like a
backwards step to do so. It already doesn't work (it hits the BUG()
case currently), and given that the user took an explicit step to
say "this pathspec should glob", it is reasonable for us to say "no,
--follow does not support globbing" (or in the case of log.follow,
avoid turning on follow mode). Which is what happens after this
patch.
- The set of allowed pathspec magic is obviously the same as in
GUARD_PATHSPEC(). We could perhaps factor these out to avoid
repetition. The point of having separate masks and GUARD calls is
that we don't necessarily know which parsed pathspecs will be used
where. But in this case, the two are heavily correlated. Still,
there may be some value in keeping them separate; it would make
anyone think twice about adding new magic to the list in
diff_check_follow_pathspec(). They'd need to touch
try_to_follow_renames() as well, which is the code that would
actually need to be updated to handle more exotic pathspecs.
- The documentation for log.follow says that it enables --follow
"...when a single <path> is given". We could possibly expand that to
say "with no unsupported pathspec magic", but that raises the
question of documenting which magic is supported. I think the
existing wording of "single <path>" sufficiently encompasses the
idea (the forbidden magic is stuff that might match multiple
entries), and the spirit remains the same.
Reported-by: Jim Pryor <dubiousjim@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In --follow mode, we require exactly one pathspec. We check this
condition in two places:
- in diff_setup_done(), we complain if --follow is used with an
inapropriate pathspec
- in git-log's revision "tweak" function, we enable log.follow only if
the pathspec allows it
The duplication isn't a big deal right now, since the logic is so
simple. But in preparation for it becoming more complex, let's pull it
into a shared function.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --dirstat" leaked memory, which has been plugged.
* jc/dirstat-plug-leaks:
diff: plug leaks in dirstat
diff: refactor common tail part of dirstat computation
More header clean-up.
* en/header-split-cache-h-part-2: (22 commits)
reftable: ensure git-compat-util.h is the first (indirect) include
diff.h: reduce unnecessary includes
object-store.h: reduce unnecessary includes
commit.h: reduce unnecessary includes
fsmonitor: reduce includes of cache.h
cache.h: remove unnecessary headers
treewide: remove cache.h inclusion due to previous changes
cache,tree: move basic name compare functions from read-cache to tree
cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c
hash-ll.h: split out of hash.h to remove dependency on repository.h
tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h
dir.h: move DTYPE defines from cache.h
versioncmp.h: move declarations for versioncmp.c functions from cache.h
ws.h: move declarations for ws.c functions from cache.h
match-trees.h: move declarations for match-trees.c functions from cache.h
pkt-line.h: move declarations for pkt-line.c functions from cache.h
base85.h: move declarations for base85.c functions from cache.h
copy.h: move declarations for copy.c functions from cache.h
server-info.h: move declarations for server-info.c functions from cache.h
packfile.h: move pack_window and pack_entry from cache.h
...
Sergey Organov noticed and reported "--patch --no-patch --raw"
behaves differently from just "--raw". It turns out that there are
a few interesting bugs in the implementation and documentation.
* First, the documentation for "--no-patch" was unclear that it
could be read to mean "--no-patch" countermands an earlier
"--patch" but not other things. The intention of "--no-patch"
ever since it was introduced at d09cd15d (diff: allow --no-patch
as synonym for -s, 2013-07-16) was to serve as a synonym for
"-s", so "--raw --patch --no-patch" should have produced no
output, but it can be (mis)read to allow showing only "--raw"
output.
* Then the interaction between "-s" and other format options were
poorly implemented. Modern versions of Git uses one bit each to
represent formatting options like "--patch", "--stat" in a single
output_format word, but for historical reasons, "-s" also is
represented as another bit in the same word. This allows two
interesting bugs to happen, and we have both X-<.
(1) After setting a format bit, then setting NO_OUTPUT with "-s",
the code to process another "--<format>" option drops the
NO_OUTPUT bit to allow output to be shown again. However,
the code to handle "-s" only set NO_OUTPUT without unsetting
format bits set earlier, so the earlier format bit got
revealed upon seeing the second "--<format>" option. This is
the problem Sergey observed.
(2) After setting NO_OUTPUT with "-s", code to process
"--<format>" option can forget to unset NO_OUTPUT, leaving
the command still silent.
It is tempting to change the meaning of "--no-patch" to mean
"disable only the patch format output" and reimplement "-s" as "not
showing anything", but it would be an end-user visible change in
behavior. Let's fix the interactions of these bits to first make
"-s" work as intended.
The fix is conceptually very simple.
* Whenever we set DIFF_FORMAT_FOO because we saw the "--foo"
option (e.g. DIFF_FORMAT_RAW is set when the "--raw" option is
given), we make sure we drop DIFF_FORMAT_NO_OUTPUT. We forgot to
do so in some of the options and caused (2) above.
* When processing "-s" option, we should not just set
DIFF_FORMAT_NO_OUTPUT bit, but clear other DIFF_FORMAT_* bits.
We didn't do so and retained format bits set by options
previously seen, causing (1) above.
It is even more tempting to lose NO_OUTPUT bit and instead take
output_format word being 0 as its replacement, but that would break
the mechanism "git show" uses to default to "--patch" output, where
the distinction between telling the command to be silent with "-s"
and having no output format specified on the command line matters,
and an explicit output format given on the command line should not
be "combined" with the default "--patch" format.
So, while we cannot lose the NO_OUTPUT bit, as a follow-up work, we
may want to replace it with OPTION_GIVEN bit, and
* make "--patch", "--raw", etc. set DIFF_FORMAT_$format bit and
DIFF_FORMAT_OPTION_GIVEN bit on for each format. "--no-raw",
etc. will set off DIFF_FORMAT_$format bit but still record the
fact that we saw an option from the command line by setting
DIFF_FORMAT_OPTION_GIVEN bit.
* make "-s" (and its synonym "--no-patch") clear all other bits
and set only the DIFF_FORMAT_OPTION_GIVEN bit on.
which I suspect would make the code much cleaner without breaking
any end-user expectations.
Once that is in place, transitioning "--no-patch" to mean the
counterpart of "--patch", just like "--no-raw" only defeats an
earlier "--raw", would be quite simple at the code level. The
social cost of migrating the end-user expectations might be too
great for it to be worth, but at least the "GIVEN" bit clean-up
alone may be worth it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The array of dirstat_file contained in the dirstat_dir structure is
not freed after the processing ends. Unfortunately, the member that
points at the array, .files, is incremented as the gather_dirstat()
function recursively walks it, and this needs to be plugged by
remembering the beginning of the array before gather_dirstat() mucks
with it and freeing it after we are done.
We can mark t4047 as leak-free. t4000, which is marked as
leak-free, now can exercise dirstat in it, which will happen next.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark strtok() and strtok_r() to be banned.
* tb/ban-strtok:
banned.h: mark `strtok()` and `strtok_r()` as banned
t/helper/test-json-writer.c: avoid using `strtok()`
t/helper/test-oidmap.c: avoid using `strtok()`
t/helper/test-hashmap.c: avoid using `strtok()`
string-list: introduce `string_list_setlen()`
string-list: multi-delimiter `string_list_split_in_place()`
Header clean-up.
* en/header-split-cache-h: (24 commits)
protocol.h: move definition of DEFAULT_GIT_PORT from cache.h
mailmap, quote: move declarations of global vars to correct unit
treewide: reduce includes of cache.h in other headers
treewide: remove double forward declaration of read_in_full
cache.h: remove unnecessary includes
treewide: remove cache.h inclusion due to pager.h changes
pager.h: move declarations for pager.c functions from cache.h
treewide: remove cache.h inclusion due to editor.h changes
editor: move editor-related functions and declarations into common file
treewide: remove cache.h inclusion due to object.h changes
object.h: move some inline functions and defines from cache.h
treewide: remove cache.h inclusion due to object-file.h changes
object-file.h: move declarations for object-file.c functions from cache.h
treewide: remove cache.h inclusion due to git-zlib changes
git-zlib: move declarations for git-zlib functions from cache.h
treewide: remove cache.h inclusion due to object-name.h changes
object-name.h: move declarations for object-name.c functions from cache.h
treewide: remove unnecessary cache.h inclusion
treewide: be explicit about dependence on mem-pool.h
treewide: be explicit about dependence on oid-array.h
...
Enhance `string_list_split_in_place()` to accept multiple characters as
delimiters instead of a single character.
Instead of using `strchr(2)` to locate the first occurrence of the given
delimiter character, `string_list_split_in_place_multi()` uses
`strcspn(2)` to move past the initial segment of characters comprised of
any characters in the delimiting set.
When only a single delimiting character is provided, `strpbrk(2)` (which
is implemented with `strcspn(2)`) has equivalent performance to
`strchr(2)`. Modern `strcspn(2)` implementations treat an empty
delimiter or the singleton delimiter as a special case and fall back to
calling strchrnul(). Both glibc[1] and musl[2] implement `strcspn(2)`
this way.
This change is one step to removing `strtok(2)` from the tree. Note that
`string_list_split_in_place()` is not a strict replacement for
`strtok()`, since it will happily turn sequential delimiter characters
into empty entries in the resulting string_list. For example:
string_list_split_in_place(&xs, "foo:;:bar:;:baz", ":;", -1)
would yield a string list of:
["foo", "", "", "bar", "", "", "baz"]
Callers that wish to emulate the behavior of strtok(2) more directly
should call `string_list_remove_empty_items()` after splitting.
To avoid regressions for the new multi-character delimter cases, update
t0063 in this patch as well.
[1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcspn.c;hb=glibc-2.37#l35
[2]: https://git.musl-libc.org/cgit/musl/tree/src/string/strcspn.c?h=v1.2.3#n11
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>