Commit graph

1569 commits

Author SHA1 Message Date
Elijah Newren
9b08091cb7 diff: have submodule_format logic avoid additional diff headers
Commit 95433eeed9 ("diff: add ability to insert additional headers for
paths", 2022-02-02) introduced the possibility of additional headers,
created in create_filepairs_for_header_only_notifications().  These are
represented by inserting additional pairs in diff_queued_diff which
always have a mode of 0 and a null_oid.  When these were added, one
code path was noted to assume that at least one of the diff_filespecs
in the pair were valid, and that codepath was corrected.

The submodule_format handling is another codepath with the same issue;
it would operate on these additional headers and attempt to display them
as submodule changes.  Prevent that by explicitly checking for "phoney"
filepairs (i.e. filepairs with both modes being 0).

Reported-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-02 09:22:25 -07:00
Ævar Arnfjörð Bjarmason
5cf88fd8b0 git-compat-util.h: use "UNUSED", not "UNUSED(var)"
As reported in [1] the "UNUSED(var)" macro introduced in
2174b8c75d (Merge branch 'jk/unused-annotation' into next,
2022-08-24) breaks coccinelle's parsing of our sources in files where
it occurs.

Let's instead partially go with the approach suggested in [2] of
making this not take an argument. As noted in [1] "coccinelle" will
ignore such tokens in argument lists that it doesn't know about, and
it's less of a surprise to syntax highlighters.

This undoes the "help us notice when a parameter marked as unused is
actually use" part of 9b24034754 (git-compat-util: add UNUSED macro,
2022-08-19), a subsequent commit will further tweak the macro to
implement a replacement for that functionality.

1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 10:49:48 -07:00
Jeff King
02c3c59e62 hashmap: mark unused callback parameters
Hashmap comparison functions must conform to a particular callback
interface, but many don't use all of their parameters. Especially the
void cmp_data pointer, but some do not use keydata either (because they
can easily form a full struct to pass when doing lookups). Let's mark
these to make -Wunused-parameter happy.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19 12:18:55 -07:00
Jeff King
783a86c142 config: mark unused callback parameters
The callback passed to git_config() must conform to a particular
interface. But most callbacks don't actually look at the extra "void
*data" parameter. Let's mark the unused parameters to make
-Wunused-parameter happy.

Note there's one unusual case here in get_remote_default() where we
actually ignore the "value" parameter. That's because it's only checking
whether the option is found at all, and not parsing its value.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19 12:18:55 -07:00
SZEDER Gábor
99d86d60e5 parse-options: PARSE_OPT_KEEP_UNKNOWN only applies to --options
The description of 'PARSE_OPT_KEEP_UNKNOWN' starts with "Keep unknown
arguments instead of erroring out".  This is a bit misleading, as this
flag only applies to unknown --options, while non-option arguments are
kept even without this flag.

Update the description to clarify this, and rename the flag to
PARSE_OPTIONS_KEEP_UNKNOWN_OPT to make this obvious just by looking at
the flag name.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19 11:13:14 -07:00
Junio C Hamano
7f8d098b1b Merge branch 'ab/cocci-unused'
Add Coccinelle rules to detect the pattern of initializing and then
finalizing a structure without using it in between at all, which
happens after code restructuring and the compilers fail to
recognize as an unused variable.

* ab/cocci-unused:
  cocci: generalize "unused" rule to cover more than "strbuf"
  cocci: add and apply a rule to find "unused" strbufs
  cocci: have "coccicheck{,-pending}" depend on "coccicheck-test"
  cocci: add a "coccicheck-test" target and test *.cocci rules
  Makefile & .gitignore: ignore & clean "git.res", not "*.res"
  Makefile: remove mandatory "spatch" arguments from SPATCH_FLAGS
2022-07-18 13:31:57 -07:00
Ævar Arnfjörð Bjarmason
4f40f6cb73 cocci: add and apply a rule to find "unused" strbufs
Add a coccinelle rule to remove "struct strbuf" initialization
followed by calling "strbuf_release()" function, without any uses of
the strbuf in the same function.

See the tests in contrib/coccinelle/tests/unused.{c,res} for what it's
intended to find and replace.

The inclusion of "contrib/scalar/scalar.c" is because "spatch" was
manually run on it (we don't usually run spatch on contrib).

Per the "buggy code" comment we also match a strbuf_init() before the
xmalloc(), but we're not seeking to be so strict as to make checks
that the compiler will catch for us redundant. Saying we'll match
either "init" or "xmalloc" lines makes the rule simpler.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-06 12:24:43 -07:00
Johannes Schindelin
2715e8a931 merge-ort: make path_messages a strmap to a string_list
This allows us once again to get away with less data copying.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22 16:10:06 -07:00
René Scharfe
8af759374e diff: use mks_tempfile_dt()
Git uses temporary files to pass the contents of blobs to external diff
programs and textconv filters.  It calls mks_tempfile_ts() to create
them, which puts them all in the same directory.  This requires adding
a random name prefix.

Use mks_tempfile_dt() instead, which allows the files to have arbitrary
names, each in their own separate temporary directory.  This way they
can have the same basename as the original blob, which looks nicer in
graphical diff programs.

The test in t4020 to check the prettiness of the temporary paths was
neutered by 5476bdf0e8 (diff tests: don't ignore "git diff" exit code in
"read" loop, 2022-03-07), which removed its grep check without replacing
it with an equivalent test_cmp check.  Add one that only checks the
basename of the temporary file and nothing else.

And make the test more robust while at it, by using test_when_finished
to get rid of the added file even if the test fails.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-20 16:17:35 -07:00
Junio C Hamano
361c2566c0 Merge branch 'ab/plug-random-leaks'
Double-free fix for a recently merged topic.

* ab/plug-random-leaks:
  diff.c: fix a double-free regression in a18d66cefb
  tests: demonstrate "show --word-diff --color-moved" regression
2022-03-23 14:09:31 -07:00
Ævar Arnfjörð Bjarmason
77e56d55ba diff.c: fix a double-free regression in a18d66cefb
My a18d66cefb (diff.c: free "buf" in diff_words_flush(), 2022-03-04)
has what it retrospect is a rather obvious bug (I don't know what I
was thinking, if it all): We use the "emitted_symbols" allocation in
append_emitted_diff_symbol() N times, but starting with a18d66cefb
we'd free it after its first use!

The correct way to free this data would have been to add the free() to
the existing free_diff_words_data() function, so let's do that. The
"ecbdata->diff_words->opt->emitted_symbols" might be NULL, so let's
add a trivial free_emitted_diff_symbols() helper next to the function
that appends to it.

This fixes the "no effect on show from" leak tested for in the
preceding commit. Perhaps confusingly this change will skip that test
under SANITIZE=leak, but otherwise opt-in the
"t4015-diff-whitespace.sh" test.

The reason is that a18d66cefb "fixed" the leak in the preceding "no
effect on diff" test, but for the first call to diff_words_flush() the
"wol->buf" would be NULL, so we wouldn't double-free (and
SANITIZE=address would see nothing amiss). With this change we'll
still pass that test, showing that we've also fixed leaks on this
codepath.

We then have to skip the new "no effect on show" test because it
happens to trip over an unrelated memory leak (in revision.c). The
same goes for "move detection with submodules". Both of them pass with
SANITIZE=address though, which would error on the "no effect on show"
test before this change.

Reported-by: Michael J Gruber <git@grubix.eu>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-17 08:49:13 -07:00
Junio C Hamano
ccafbbfb4e Merge branch 'ab/plug-random-leaks'
Plug random memory leaks.

* ab/plug-random-leaks:
  repository.c: free the "path cache" in repo_clear()
  range-diff: plug memory leak in read_patches()
  range-diff: plug memory leak in common invocation
  lockfile API users: simplify and don't leak "path"
  commit-graph: stop fill_oids_from_packs() progress on error and free()
  commit-graph: fix memory leak in misused string_list API
  submodule--helper: fix trivial leak in module_add()
  transport: stop needlessly copying bundle header references
  bundle: call strvec_clear() on allocated strvec
  remote-curl.c: free memory in cmd_main()
  urlmatch.c: add and use a *_release() function
  diff.c: free "buf" in diff_words_flush()
  merge-base: free() allocated "struct commit **" list
  index-pack: fix memory leaks
2022-03-13 22:56:18 +00:00
Junio C Hamano
6d8d81ec36 Merge branch 'ac/usage-string-fixups'
Usage-string normalization.

* ac/usage-string-fixups:
  amend remaining usage strings according to style guide
2022-03-06 21:25:32 -08:00
Ævar Arnfjörð Bjarmason
a18d66cefb diff.c: free "buf" in diff_words_flush()
Amend the freeing logic added in e6e045f803 (diff.c: buffer all
output if asked to, 2017-06-29) to free the containing "buf" in
addition to its members.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-04 13:24:18 -08:00
Junio C Hamano
362f869ff2 Merge branch 'ab/diff-free-more'
Leakfixes.

* ab/diff-free-more:
  diff.[ch]: have diff_free() free options->parseopts
  diff.[ch]: have diff_free() call clear_pathspec(opts.pathspec)
2022-02-25 15:47:36 -08:00
Abhradeep Chakraborty
9e1f22c8ad amend remaining usage strings according to style guide
Usage strings for git (sub)command flags has a style guide that
suggests - first letter should not capitalized (unless required)
and it should skip full-stop at the end of line. But there are
some files where usage-strings do not follow the above mentioned
guide.

Amend the usage strings that don't follow the style convention/guide.

Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-23 14:43:10 -08:00
Junio C Hamano
9a160990ef Merge branch 'js/diff-filter-negation-fix'
"git diff --diff-filter=aR" is now parsed correctly.

* js/diff-filter-negation-fix:
  diff-filter: be more careful when looking for negative bits
  diff.c: move the diff filter bits definitions up a bit
  docs(diff): lose incorrect claim about `diff-files --diff-filter=A`
2022-02-16 15:14:30 -08:00
Junio C Hamano
90b7153806 Merge branch 'en/remerge-diff'
"git log --remerge-diff" shows the difference from mechanical merge
result and the result that is actually recorded in a merge commit.

* en/remerge-diff:
  diff-merges: avoid history simplifications when diffing merges
  merge-ort: mark conflict/warning messages from inner merges as omittable
  show, log: include conflict/warning messages in --remerge-diff headers
  diff: add ability to insert additional headers for paths
  merge-ort: format messages slightly different for use in headers
  merge-ort: mark a few more conflict messages as omittable
  merge-ort: capture and print ll-merge warnings in our preferred fashion
  ll-merge: make callers responsible for showing warnings
  log: clean unneeded objects during `log --remerge-diff`
  show, log: provide a --remerge-diff capability
2022-02-16 15:14:29 -08:00
Ævar Arnfjörð Bjarmason
6ee36364eb diff.[ch]: have diff_free() free options->parseopts
The "struct option" added in 4a28847839 (diff.c: prepare to use
parse_options() for parsing, 2019-01-27) would be free'd in the case
of diff_setup_done() being called.

But not all codepaths that allocate it reach that,
e.g. "t6427-diff3-conflict-markers.sh" will now free memory that it
didn't free before. By using FREE_AND_NULL() here (which
diff_setup_done() also does) we ensure that we free the memory, and
that we won't have double-free's.

Before this running:

    ./t6427-diff3-conflict-markers.sh -vixd --run=7

Would report:

    SUMMARY: LeakSanitizer: 7823 byte(s) leaked in 6 allocation(s).

But now we'll report:

    SUMMARY: LeakSanitizer: 703 byte(s) leaked in 5 allocation(s).

I.e. the largest leak in that particular test has now been addressed.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-16 13:50:37 -08:00
Ævar Arnfjörð Bjarmason
244c27242f diff.[ch]: have diff_free() call clear_pathspec(opts.pathspec)
Have the diff_free() function call clear_pathspec(). Since the
diff_flush() function calls this all its callers can be simplified to
rely on it instead.

When I added the diff_free() function in e900d494dc (diff: add an API
for deferred freeing, 2021-02-11) I simply missed this, or wasn't
interested in it. Let's consolidate this now. This means that any
future callers (and I've got revision.c in mind) that embed a "struct
diff_options" can simply call diff_free() instead of needing know that
it has an embedded pathspec.

This does fix a bunch of leaks, but I can't mark any test here as
passing under the SANITIZE=leak testing mode because in
886e1084d7 (builtin/: add UNLEAKs, 2017-10-01) an UNLEAK(rev) was
added, which plasters over the memory
leak. E.g. "t4011-diff-symlink.sh" would report fewer leaks with this
fix, but because of the UNLEAK() reports none.

I'll eventually loop around to removing that UNLEAK(rev) annotation as
I'll fix deeper issues with the revisions API leaking. This is one
small step on the way there, a new freeing function in revisions.c
will want to call this diff_free().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-16 13:50:13 -08:00
Elijah Newren
95433eeed9 diff: add ability to insert additional headers for paths
When additional headers are provided, we need to
  * add diff_filepairs to diff_queued_diff for each paths in the
    additional headers map which, unless that path is part of
    another diff_filepair already found in diff_queued_diff
  * format the headers (colorization, line_prefix for --graph)
  * make sure the various codepaths that attempt to return early
    if there are "no changes" take into account the headers that
    need to be shown.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02 10:02:28 -08:00
Johannes Schindelin
75408ca949 diff-filter: be more careful when looking for negative bits
The `--diff-filter=<bits>` option allows to filter the diff by certain
criteria, for example `R` to only show renamed files. It also supports
negating a filter via a down-cased letter, i.e. `r` to show _everything
but_ renamed files.

However, the code is a bit overzealous when trying to figure out whether
`git diff` should start with all diff-filters turned on because the user
provided a lower-case letter: if the `--diff-filter` argument starts
with an upper-case letter, we must not start with all bits turned on.

Even worse, it is possible to specify the diff filters in multiple,
separate options, e.g. `--diff-filter=AM [...] --diff-filter=m`.

Let's accumulate the include/exclude filters independently, and only
special-case the "only exclude filters were specified" case after
parsing the options altogether.

Note: The code replaced by this commit took pains to avoid setting any
unused bits of `options->filter`. That was unnecessary, though, as all
accesses happen via the `filter_bit_tst()` function using specific bits,
and setting the unused bits has no effect. Therefore, we can simplify
the code by using `~0` (or in this instance, `~<unwanted-bit>`).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-28 10:18:18 -08:00
Johannes Schindelin
4d4d4eaa7b diff.c: move the diff filter bits definitions up a bit
This prepares for a more careful handling of the `--diff-filter`
options over the next few commits.

This commit is best viewed with `--color-moved`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-28 10:18:17 -08:00
Junio C Hamano
c17de5a505 Merge branch 'ja/i18n-similar-messages'
Similar message templates have been consolidated so that
translators need to work on fewer number of messages.

* ja/i18n-similar-messages:
  i18n: turn even more messages into "cannot be used together" ones
  i18n: ref-filter: factorize "%(foo) atom used without %(bar) atom"
  i18n: factorize "--foo outside a repository"
  i18n: refactor "unrecognized %(foo) argument" strings
  i18n: factorize "no directory given for --foo"
  i18n: factorize "--foo requires --bar" and the like
  i18n: tag.c factorize i18n strings
  i18n: standardize "cannot open" and "cannot read"
  i18n: turn "options are incompatible" into "cannot be used together"
  i18n: refactor "%s, %s and %s are mutually exclusive"
  i18n: refactor "foo and bar are mutually exclusive"
2022-01-10 11:52:56 -08:00
Junio C Hamano
2b755b3371 Merge branch 'pw/diff-color-moved-fix'
Correctness and performance update to "diff --color-moved" feature.

* pw/diff-color-moved-fix:
  diff --color-moved: intern strings
  diff: use designated initializers for emitted_diff_symbol
  diff --color-moved-ws=allow-indentation-change: improve hash lookups
  diff --color-moved: stop clearing potential moved blocks
  diff --color-moved: shrink potential moved blocks as we go
  diff --color-moved: unify moved block growth functions
  diff --color-moved: call comparison function directly
  diff --color-moved-ws=allow-indentation-change: simplify and optimize
  diff: simplify allow-indentation-change delta calculation
  diff --color-moved: avoid false short line matches and bad zebra coloring
  diff --color-moved=zebra: fix alternate coloring
  diff --color-moved: rewind when discarding pmb
  diff --color-moved: factor out function
  diff --color-moved: clear all flags on blocks that are too short
  diff --color-moved: add perf tests
2022-01-05 14:01:29 -08:00
Jean-Noël Avila
246cac8505 i18n: turn even more messages into "cannot be used together" ones
Even if some of these messages are not subject to gettext i18n, this
helps bring a single style of message for a given error type.

Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-05 13:31:00 -08:00
Jean-Noël Avila
c488182903 i18n: refactor "%s, %s and %s are mutually exclusive"
Use placeholders for constant tokens. The strings are turned into
"cannot be used together"

Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-05 13:29:23 -08:00
Phillip Wood
72962e8b3c diff --color-moved: intern strings
Taking inspiration from xdl_classify_record() assign an id to each
addition and deletion such that lines that match for the current
--color-moved-ws mode share the same unique id. This reduces the
number of hash lookups a little (calculating the ids still involves
one hash lookup per line) but the main benefit is that when growing
blocks of potentially moved lines we can replace string comparisons
which involve chasing a pointer with a simple integer comparison. On a
large diff this commit reduces the time to run 'diff --color-moved' by
37% compared to the previous commit and 31% compared to master, for
'diff --color-moved-ws=allow-indentation-change' the reduction is 28%
compared to the previous commit and 96% compared to master. There is
little change in the performance of 'git log --patch' as the diffs are
smaller.

Test                                                                  HEAD^              HEAD
---------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38(0.33+0.05)    0.38(0.33+0.05)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.88(0.81+0.06)    0.55(0.50+0.04) -37.5%
4002.3: diff --color-moved-ws=allow-indentation-change large change   0.85(0.79+0.06)    0.61(0.54+0.06) -28.2%
4002.4: log --no-color-moved --no-color-moved-ws                      1.16(1.07+0.08)    1.15(1.09+0.05)  -0.9%
4002.5: log --color-moved --no-color-moved-ws                         1.31(1.22+0.08)    1.29(1.19+0.09)  -1.5%
4002.6: log --color-moved-ws=allow-indentation-change                 1.32(1.24+0.08)    1.31(1.18+0.13)  -0.8%

Test                                                                  master             HEAD
---------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.33+0.05)   0.38(0.33+0.05)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.80 (0.75+0.04)   0.55(0.50+0.04) -31.2%
4002.3: diff --color-moved-ws=allow-indentation-change large change  14.20(14.15+0.05)   0.61(0.54+0.06) -95.7%
4002.4: log --no-color-moved --no-color-moved-ws                      1.15 (1.05+0.09)   1.15(1.09+0.05)  +0.0%
4002.5: log --color-moved --no-color-moved-ws                         1.30 (1.19+0.11)   1.29(1.19+0.09)  -0.8%
4002.6: log --color-moved-ws=allow-indentation-change                 1.70 (1.63+0.06)   1.31(1.18+0.13) -22.9%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09 13:24:06 -08:00
Phillip Wood
b4a5c5c419 diff: use designated initializers for emitted_diff_symbol
This makes it clearer which fields are being explicitly initialized
and will simplify the next commit where we add a new field to the
struct.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09 13:24:06 -08:00
Phillip Wood
25e61909e9 diff --color-moved-ws=allow-indentation-change: improve hash lookups
As libxdiff does not have a whitespace flag to ignore the indentation
the code for --color-moved-ws=allow-indentation-change uses
XDF_IGNORE_WHITESPACE and then filters out any hash lookups where
there are non-indentation changes. This filtering is inefficient as
we have to perform another string comparison.

By using the offset data that we have already computed to skip the
indentation we can avoid using XDF_IGNORE_WHITESPACE and safely remove
the extra checks which improves the performance by 11% and paves the
way for the elimination of string comparisons in the next commit.

This change slightly increases the run time of other --color-moved
modes. This could be avoided by using different comparison functions
for the different modes but after the next two commits there is no
measurable benefit in doing so.

There is a change in behavior for lines that begin with a form-feed or
vertical-tab character. Since b46054b374 ("xdiff: use
git-compat-util", 2019-04-11) xdiff does not treat '\f' or '\v' as
whitespace characters. This means that lines starting with those
characters are never considered to be blank and never match a line
that does not start with the same character. After this patch a line
matching "^[\f\v\r]*[ \t]*$" is considered to be blank by
--color-moved-ws=allow-indentation-change and lines beginning
"^[\f\v\r]*[ \t]*" can match another line if the suffixes match. This
changes the output of git show for d18f76dccf ("compat/regex: use the
regex engine from gawk for compat", 2010-08-17) as some lines in the
pre-image before a moved block that contain '\f' are now considered
moved as well as they match a blank line before the moved lines in the
post-image. This commit updates one of the tests to reflect this
change.

Test                                                                  HEAD^             HEAD
--------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38(0.33+0.05)   0.38(0.33+0.05)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.86(0.82+0.04)   0.88(0.84+0.04)  +2.3%
4002.3: diff --color-moved-ws=allow-indentation-change large change   0.97(0.94+0.03)   0.86(0.81+0.05) -11.3%
4002.4: log --no-color-moved --no-color-moved-ws                      1.16(1.07+0.09)   1.16(1.06+0.09)  +0.0%
4002.5: log --color-moved --no-color-moved-ws                         1.32(1.26+0.06)   1.33(1.27+0.05)  +0.8%
4002.6: log --color-moved-ws=allow-indentation-change                 1.35(1.29+0.06)   1.33(1.24+0.08)  -1.5%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09 13:24:06 -08:00
Phillip Wood
eec7f53b31 diff --color-moved: stop clearing potential moved blocks
moved_block_clear() was introduced in 74d156f4a1 ("diff
--color-moved-ws: fix double free crash", 2018-10-04) to free the
memory that was allocated when initializing a potential moved
block. However since 21536d077f ("diff --color-moved-ws: modify
allow-indentation-change", 2018-11-23) initializing a potential moved
block no longer allocates any memory. Up until the last commit we were
relying on moved_block_clear() to set the `match` pointer to NULL when
a block stopped matching, but since that commit we do not clear a
moved block that does not match so it does not make sense to clear
them elsewhere.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09 13:24:06 -08:00
Phillip Wood
0e488f1732 diff --color-moved: shrink potential moved blocks as we go
Rather than setting `match` to NULL and then looping over the list of
potential matched blocks for a second time to remove blocks with no
matches just filter out the blocks with no matches as we go.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09 13:24:06 -08:00
Phillip Wood
ff046a0066 diff --color-moved: unify moved block growth functions
After the last two commits pmb_advance_or_null() and
pmb_advance_or_null_multi_match() differ only in the comparison they
perform. Lets simplify the code by combining them into a single
function.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09 13:24:06 -08:00
Phillip Wood
08fba1076f diff --color-moved: call comparison function directly
This change will allow us to easily combine pmb_advance_or_null() and
pmb_advance_or_null_multi_match() in the next commit. Calling
xdiff_compare_lines() directly rather than using a function pointer
from the hash map has little effect on the run time.

Test                                                                  HEAD^             HEAD
-------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38(0.35+0.03)   0.38(0.32+0.06) +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.87(0.83+0.04)   0.87(0.80+0.06) +0.0%
4002.3: diff --color-moved-ws=allow-indentation-change large change   0.97(0.92+0.04)   0.97(0.93+0.04) +0.0%
4002.4: log --no-color-moved --no-color-moved-ws                      1.17(1.06+0.10)   1.16(1.10+0.05) -0.9%
4002.5: log --color-moved --no-color-moved-ws                         1.32(1.24+0.08)   1.31(1.22+0.09) -0.8%
4002.6: log --color-moved-ws=allow-indentation-change                 1.36(1.25+0.10)   1.35(1.25+0.10) -0.7%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09 13:24:05 -08:00
Phillip Wood
52d14e166d diff --color-moved-ws=allow-indentation-change: simplify and optimize
If we already have a block of potentially moved lines then as we move
down the diff we need to check if the next line of each potentially
moved line matches the current line of the diff. The implementation of
--color-moved-ws=allow-indentation-change was needlessly performing
this check on all the lines in the diff that matched the current line
rather than just the current line. To exacerbate the problem finding
all the other lines in the diff that match the current line involves a
fuzzy lookup so we were wasting even more time performing a second
comparison to filter out the non-matching lines. Fixing this reduces
time to run
  git diff --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
by 93% compared to master and simplifies the code.

Test                                                                  HEAD^              HEAD
---------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.35+0.03)   0.38(0.35+0.03)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.86 (0.80+0.06)   0.87(0.83+0.04)  +1.2%
4002.3: diff --color-moved-ws=allow-indentation-change large change  19.01(18.93+0.06)   0.97(0.92+0.04) -94.9%
4002.4: log --no-color-moved --no-color-moved-ws                      1.16 (1.06+0.09)   1.17(1.06+0.10)  +0.9%
4002.5: log --color-moved --no-color-moved-ws                         1.32 (1.25+0.07)   1.32(1.24+0.08)  +0.0%
4002.6: log --color-moved-ws=allow-indentation-change                 1.71 (1.64+0.06)   1.36(1.25+0.10) -20.5%

Test                                                                  master             HEAD
---------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.33+0.05)   0.38(0.35+0.03)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.80 (0.75+0.04)   0.87(0.83+0.04)  +8.7%
4002.3: diff --color-moved-ws=allow-indentation-change large change  14.20(14.15+0.05)   0.97(0.92+0.04) -93.2%
4002.4: log --no-color-moved --no-color-moved-ws                      1.15 (1.05+0.09)   1.17(1.06+0.10)  +1.7%
4002.5: log --color-moved --no-color-moved-ws                         1.30 (1.19+0.11)   1.32(1.24+0.08)  +1.5%
4002.6: log --color-moved-ws=allow-indentation-change                 1.70 (1.63+0.06)   1.36(1.25+0.10) -20.0%

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09 13:24:05 -08:00
Phillip Wood
76e32d6193 diff: simplify allow-indentation-change delta calculation
Now that we reliably end a block when the sign changes we don't need
the whitespace delta calculation to rely on the sign.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09 13:24:05 -08:00
Phillip Wood
eb89352504 diff --color-moved: avoid false short line matches and bad zebra coloring
When marking moved lines it is possible for a block of potential
matched lines to extend past a change in sign when there is a sequence
of added lines whose text matches the text of a sequence of deleted
and added lines. Most of the time either `match` will be NULL or
`pmb_advance_or_null()` will fail when the loop encounters a change of
sign but there are corner cases where `match` is non-NULL and
`pmb_advance_or_null()` successfully advances the moved block despite
the change in sign.

One consequence of this is highlighting a short line as moved when it
should not be. For example

-moved line  # Correctly highlighted as moved
+short line  # Wrongly highlighted as moved
 context
+moved line  # Correctly highlighted as moved
+short line
 context
-short line

The other consequence is coloring a moved addition following a moved
deletion in the wrong color. In the example below the first "+moved
line 3" should be highlighted as newMoved not newMovedAlternate.

-moved line 1 # Correctly highlighted as oldMoved
-moved line 2 # Correctly highlighted as oldMovedAlternate
+moved line 3 # Wrongly highlighted as newMovedAlternate
 context      # Everything else is highlighted correctly
+moved line 2
+moved line 3
 context
+moved line 1
-moved line 3

These false matches are more likely when using --color-moved-ws with
the exception of --color-moved-ws=allow-indentation-change which ties
the sign of the current whitespace delta to the sign of the line to
avoid this problem. The fix is to check that the sign of the new line
being matched is the same as the sign of the line that started the
block of potential matches.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09 13:24:05 -08:00
Phillip Wood
eb315457f6 diff --color-moved=zebra: fix alternate coloring
b0a2ba4776 ("diff --color-moved=zebra: be stricter with color
alternation", 2018-11-23) sought to avoid using the alternate colors
unless there are two adjacent moved blocks of the same
sign. Unfortunately it contains two bugs that prevented it from fixing
the problem properly. Firstly `last_symbol` is reset at the start of
each iteration of the loop losing the symbol of the last line and
secondly when deciding whether to use the alternate color it should be
checking if the current line is the same sign of the last line, not a
different sign. The combination of the two errors means that we still
use the alternate color when we should do but we also use it when we
shouldn't. This is most noticable when using
--color-moved-ws=allow-indentation-change with hunks like

-this line gets indented
+    this line gets indented

where the post image is colored with newMovedAlternate rather than
newMoved. While this does not matter much, the next commit will change
the coloring to be correct in this case, so lets fix the bug here to
make it clear why the output is changing and add a regression test.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09 13:24:05 -08:00
Phillip Wood
0990658bf8 diff --color-moved: rewind when discarding pmb
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09 13:24:05 -08:00
Phillip Wood
7dfe427107 diff --color-moved: factor out function
This code is quite heavily indented and having it in its own function
simplifies an upcoming change.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09 13:24:05 -08:00
Phillip Wood
bea084ba41 diff --color-moved: clear all flags on blocks that are too short
If a block of potentially moved lines is not long enough then the
DIFF_SYMBOL_MOVED_LINE flag is cleared on the matching lines so they
are not marked as moved. To avoid problems when we start rewinding
after an unsuccessful match in a couple of commits time make sure all
the move related flags are cleared, not just DIFF_SYMBOL_MOVED_LINE.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-09 13:24:05 -08:00
Ævar Arnfjörð Bjarmason
7f14609e29 run-command API users: use strvec_push(), not argv construction
Change a pattern of hardcoding an "argv" array size, populating it and
assigning to the "argv" member of "struct child_process" to instead
use "strvec_push()" to add data to the "args" member.

As noted in the preceding commit this moves us further towards being
able to remove the "argv" member in a subsequent commit

These callers could have used strvec_pushl(), but moving to
strvec_push() makes the diff easier to read, and keeps the arguments
aligned as before.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-25 22:15:07 -08:00
Ævar Arnfjörð Bjarmason
9865b6e6a4 *.[ch] *_INIT macros: use { 0 } for a "zero out" idiom
In C it isn't required to specify that all members of a struct are
zero'd out to 0, NULL or '\0', just providing a "{ 0 }" will
accomplish that.

Let's also change code that provided N zero'd fields to just
provide one, and change e.g. "{ NULL }" to "{ 0 }" for
consistency. I.e. even if the first member is a pointer let's use "0"
instead of "NULL". The point of using "0" consistently is to pick one,
and to not have the reader wonder why we're not using the same pattern
everywhere.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 14:47:59 -07:00
Derrick Stolee
ad90da7351 diff: ignore sparse paths in diffstat
The diff_populate_filespec() method is used to describe the diff after a
merge operation is complete. In order to avoid expanding a sparse index,
the reuse_worktree_file() needs to be adapted to ignore files that are
outside of the sparse-checkout cone. The file names and OIDs used for
this check come from the merged tree in the case of the ORT strategy,
not the index, hence the ability to look into these paths without having
already expanded the index.

The work done by reuse_worktree_file() is only an optimization, and
requires the file being on disk for it to be of any value. Thus, it is
safe to exit the method early if we do not expect the file on disk.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 15:49:04 -07:00
Junio C Hamano
55194925e6 Merge branch 'ab/pickaxe-pcre2'
* ab/pickaxe-pcre2:
  diff: --pickaxe-all typofix
2021-08-06 12:52:15 -07:00
Bagas Sanjaya
11c649b891 diff: --pickaxe-all typofix
When I was fixing fuzzies as I updating po/id.po for 2.33.0 l10n round,
I noticed a triple-dash typo (--pickaxe-all) at diff.c, which according
to git-diff(1) manpage, the correct option name should be --pickaxe-all.

Fix the typo.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-04 10:34:47 -07:00
Junio C Hamano
268055bfde Merge branch 'en/rename-limits-doc'
Documentation on "git diff -l<n>" and diff.renameLimit have been
updated, and the defaults for these limits have been raised.

* en/rename-limits-doc:
  rename: bump limit defaults yet again
  diffcore-rename: treat a rename_limit of 0 as unlimited
  doc: clarify documentation for rename/copy limits
  diff: correct warning message when renameLimit exceeded
2021-07-28 13:18:03 -07:00
Elijah Newren
94b82d5686 rename: bump limit defaults yet again
These were last bumped in commit 92c57e5c1d (bump rename limit
defaults (again), 2011-02-19), and were bumped both because processors
had gotten faster, and because people were getting ugly merges that
caused problems and reporting it to the mailing list (suggesting that
folks were willing to spend more time waiting).

Since that time:
  * Linus has continued recommending kernel folks to set
    diff.renameLimit=0 (maps to 32767, currently)
  * Folks with repositories with lots of renames were happy to set
    merge.renameLimit above 32767, once the code supported that, to
    get correct cherry-picks
  * Processors have gotten faster
  * It has been discovered that the timing methodology used last time
    probably used too large example files.

The last point is probably worth explaining a bit more:

  * The "average" file size used appears to have been average blob size
    in the linux kernel history at the time (probably v2.6.25 or
    something close to it).
  * Since bigger files are modified more frequently, such a computation
    weights towards larger files.
  * Larger files may be more likely to be modified over time, but are
    not more likely to be renamed -- the mean and median blob size
    within a tree are a bit higher than the mean and median of blob
    sizes in the history leading up to that version for the linux
    kernel.
  * The mean blob size in v2.6.25 was half the average blob size in
    history leading to that point
  * The median blob size in v2.6.25 was about 40% of the mean blob size
    in v2.6.25.
  * Since the mean blob size is more than double the median blob size,
    any file as big as the mean will not be compared to any files of
    median size or less (because they'd be more than 50% dissimilar).
  * Since it is the number of files compared that provides the O(n^2)
    behavior, median-sized files should matter more than mean-sized
    ones.

The combined effect of the above is that the file size used in past
calculations was likely about 5x too large.  Combine that with a CPU
performance improvement of ~30%, and we can increase the limits by
a factor of sqrt(5/(1-.3)) = 2.67, while keeping the original stated
time limits.

Keeping the same approximate time limit probably makes sense for
diff.renameLimit (there is no progress feedback in e.g. git log -p),
but the experience above suggests merge.renameLimit could be extended
significantly.  In fact, it probably would make sense to have an
unlimited default setting for merge.renameLimit, but that would
likely need to be coupled with changes to how progress is displayed.
(See https://lore.kernel.org/git/YOx+Ok%2FEYvLqRMzJ@coredump.intra.peff.net/
for details in that area.)  For now, let's just bump the approximate
time limit from 10s to 1m.

(Note: We do not want to use actual time limits, because getting results
that depend on how loaded your system is that day feels bad, and because
we don't discover that we won't get all the renames until after we've
put in a lot of work rather than just upfront telling the user there are
too many files involved.)

Using the original time limit of 2s for diff.renameLimit, and bumping
merge.renameLimit from 10s to 60s, I found the following timings using
the simple script at the end of this commit message (on an AWS c5.xlarge
which reports as "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"):

      N   Timing
   1300    1.995s
   7100   59.973s

So let's round down to nice even numbers and bump the limits from
400->1000, and from 1000->7000.

Here is the measure_rename_perf script (adapted from
https://lore.kernel.org/git/20080211113516.GB6344@coredump.intra.peff.net/
in particular to avoid triggering the linear handling from
basename-guided rename detection):

    #!/bin/bash

    n=$1; shift

    rm -rf repo
    mkdir repo && cd repo
    git init -q -b main

    mkdata() {
      mkdir $1
      for i in `seq 1 $2`; do
        (sed "s/^/$i /" <../sample
         echo tag: $1
        ) >$1/$i
      done
    }

    mkdata initial $n
    git add .
    git commit -q -m initial

    mkdata new $n
    git add .
    cd new
    for i in *; do git mv $i $i.renamed; done
    cd ..
    git rm -q -rf initial
    git commit -q -m new

    time git diff-tree -M -l0 --summary HEAD^ HEAD

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-15 16:54:34 -07:00
Elijah Newren
05d2c61c67 diff: correct warning message when renameLimit exceeded
The warning when quadratic rename detection was skipped referred to
"inexact rename detection".  For years, the only linear portion of
rename detection was looking for exact renames, so "inexact rename
detection" was an accurate way to refer to the quadratic portion of
rename detection.  However, that changed with commit bd24aa2f97
(diffcore-rename: guide inexact rename detection based on basenames,
2021-02-14).  Let's instead use the term "exhaustive rename detection"
to refer to the quadratic portion.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-15 16:54:24 -07:00
Junio C Hamano
4da281e84d Merge branch 'ab/pickaxe-pcre2'
Rewrite the backend for "diff -G/-S" to use pcre2 engine when
available.

* ab/pickaxe-pcre2: (22 commits)
  xdiff-interface: replace discard_hunk_line() with a flag
  xdiff users: use designated initializers for out_line
  pickaxe -G: don't special-case create/delete
  pickaxe -G: terminate early on matching lines
  xdiff-interface: allow early return from xdiff_emit_line_fn
  xdiff-interface: prepare for allowing early return
  pickaxe -S: slightly optimize contains()
  pickaxe: rename variables in has_changes() for brevity
  pickaxe -S: support content with NULs under --pickaxe-regex
  pickaxe: assert that we must have a needle under -G or -S
  pickaxe: refactor function selection in diffcore-pickaxe()
  perf: add performance test for pickaxe
  pickaxe/style: consolidate declarations and assignments
  diff.h: move pickaxe fields together again
  pickaxe: die when --find-object and --pickaxe-all are combined
  pickaxe: die when -G and --pickaxe-regex are combined
  pickaxe tests: add missing test for --no-pickaxe-regex being an error
  pickaxe tests: test for -G, -S and --find-object incompatibility
  pickaxe tests: add test for "log -S" not being a regex
  pickaxe tests: add test for diffgrep_consume() internals
  ...
2021-07-13 16:52:50 -07:00
Junio C Hamano
65c18913de Merge branch 'pw/word-diff-zero-width-matches'
The word-diff mode has been taught to work better with a word
regexp that can match an empty string.

* pw/word-diff-zero-width-matches:
  word diff: handle zero length matches
2021-05-14 08:26:06 +09:00
Ævar Arnfjörð Bjarmason
5d93460024 xdiff-interface: replace discard_hunk_line() with a flag
Remove the dummy discard_hunk_line() function added in
3b40a090fd (diff: avoid generating unused hunk header lines,
2018-11-02) in favor of having a new XDL_EMIT_NO_HUNK_HDR flag, for
use along with the two existing and similar XDL_EMIT_* flags.

Unlike the recently amended xdiff_emit_line_fn interface which'll be
called in a loop in xdl_emit_diff(), the hunk header is only emitted
once.

It makes more sense to pass this as a flag than provide a dummy
callback because that function may be able to skip doing certain work
if it knows the caller is doing nothing with the hunk header.

It would be possible to do so in the case of -U0 now, but the benefit
of doing so is so small that I haven't bothered. But this leaves the
door open to that, and more importantly makes the API use more
intuitive.

The reason we're putting a flag in the gap between 1<<0 and 1<<2 is
that the old 1<<1 flag was removed in 907681e940 (xdiff: drop
XDL_EMIT_COMMON, 2016-02-23) without re-ordering the remaining flags.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
a8d5eb6dc0 xdiff-interface: prepare for allowing early return
Change the function prototype of xdiff_emit_line_fn to return an "int"
instead of "void". Change all of those functions to "return 0",
nothing checks those return values yet, and no behavior is being
changed.

In subsequent commits the interface will be changed to allow early
return via this new return value.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
d26ec88009 pickaxe: die when --find-object and --pickaxe-all are combined
Neither the --pickaxe-all documentation nor --find-object's has ever
suggested that you can combine the two. See f506b8e8b5 (git log/diff:
add -G<regexp> that greps in the patch text, 2010-08-23) and
15af58c1ad (diffcore: add a pickaxe option to find a specific blob,
2018-01-04).

But we've silently tolerated it, which makes the logic in
diffcore_pickaxe() harder to reason about. Let's assert that we won't
have the two combined.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Ævar Arnfjörð Bjarmason
188e9e28c5 pickaxe: die when -G and --pickaxe-regex are combined
When the -G and --pickaxe-regex options are combined we simply ignore
the --pickaxe-regex option. Let's die instead as suggested by our
documentation, since -G is always a regex.

When --pickaxe-regex was added in d01d8c6782 (Support for pickaxe
matching regular expressions, 2006-03-29) only the -S option
existed. Then when -G was added in f506b8e8b5 (git log/diff: add
-G<regexp> that greps in the patch text, 2010-08-23) neither the
documentation for --pickaxe-regex was updated accordingly, nor was
something like this assertion added.

Since 5bc3f0b567 (diffcore-pickaxe doc: document -S and -G properly,
2013-05-31) we've claimed that --pickaxe-regex should only be used
with -S, but have silently tolerated combining it with -G, let's die
instead.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-11 12:47:31 +09:00
Phillip Wood
0324e8fc6b word diff: handle zero length matches
If find_word_boundaries() encounters a zero length match (which can be
caused by matching a newline or using '*' instead of '+' in the regex)
we stop splitting the input into words which generates an inaccurate
diff. To fix this increment the start point when there is a zero
length match and try a new match. This is safe as posix regular
expressions always return the longest available match so a zero length
match means there are no longer matches available from the current
position.

Commit bf82940dbf (color-words: enable REG_NEWLINE to help user,
2009-01-17) prevented matching newlines in negated character classes
but it is still possible for the user to have an explicit newline
match in the regex which could cause a zero length match.

One could argue that having explicit newline matches or using '*'
rather than '+' are user errors but it seems to be better to work
round them than produce inaccurate diffs.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-05 18:53:42 +09:00
brian m. carlson
14228447c9 hash: provide per-algorithm null OIDs
Up until recently, object IDs did not have an algorithm member, only a
hash.  Consequently, it was possible to share one null (all-zeros)
object ID among all hash algorithms.  Now that we're going to be
handling objects from multiple hash algorithms, it's important to make
sure that all object IDs have a correct algorithm field.

Introduce a per-algorithm null OID, and add it to struct hash_algo.
Introduce a wrapper function as well, and use it everywhere we used to
use the null_oid constant.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-27 16:31:39 +09:00
brian m. carlson
5951bf467e Use the final_oid_fn to finalize hashing of object IDs
When we're hashing a value which is going to be an object ID, we want to
zero-pad that value if necessary.  To do so, use the final_oid_fn
instead of the final_fn anytime we're going to create an object ID to
ensure we perform this operation.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-27 16:31:38 +09:00
René Scharfe
ca56dadb4b use CALLOC_ARRAY
Add and apply a semantic patch for converting code that open-codes
CALLOC_ARRAY to use it instead.  It shortens the code and infers the
element size automatically.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-13 16:00:09 -08:00
Junio C Hamano
845d6030f8 Merge branch 'jc/diffcore-rotate'
"git {diff,log} --{skip,rotate}-to=<path>" allows the user to
discard diff output for early paths or move them to the end of the
output.

* jc/diffcore-rotate:
  diff: --{rotate,skip}-to=<path>
2021-02-25 16:43:30 -08:00
Junio C Hamano
1eb4136ac2 diff: --{rotate,skip}-to=<path>
In the implementation of "git difftool", there is a case where the
user wants to start viewing the diffs at a specific path and
continue on to the rest, optionally wrapping around to the
beginning.  Since it is somewhat cumbersome to implement such a
feature as a post-processing step of "git diff" output, let's
support it internally with two new options.

 - "git diff --rotate-to=C", when the resulting patch would show
   paths A B C D E without the option, would "rotate" the paths to
   shows patch to C D E A B instead.  It is an error when there is
   no patch for C is shown.

 - "git diff --skip-to=C" would instead "skip" the paths before C,
   and shows patch to C D E.  Again, it is an error when there is no
   patch for C is shown.

 - "git log [-p]" also accepts these two options, but it is not an
   error if there is no change to the specified path.  Instead, the
   set of output paths are rotated or skipped to the specified path
   or the first path that sorts after the specified path.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16 09:30:42 -08:00
Ævar Arnfjörð Bjarmason
c45dc9cf30 diff: plug memory leak from regcomp() on {log,diff} -I
Fix a memory leak in 296d4a94e7 (diff: add -I<regex> that ignores
matching changes, 2020-10-20) by freeing the memory it allocates in
the newly introduced diff_free(). See the previous commit for details
on that.

This memory leak was intentionally introduced in 296d4a94e7, see the
discussion on a previous iteration of it in
https://lore.kernel.org/git/xmqqeelycajx.fsf@gitster.c.googlers.com/

At that time freeing the memory was somewhat tedious, but since it
isn't anymore with the newly introduced diff_free() let's use it.

Let's retain the pattern for diff_free_file() and add a
diff_free_ignore_regex(), even though (unlike "diff_free_file") we
don't need to call it elsewhere. I think this'll make for more
readable code than gradually accumulating a giant diff_free()
function, sharing "int i" across unrelated code etc.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-11 09:21:07 -08:00
Ævar Arnfjörð Bjarmason
e900d494dc diff: add an API for deferred freeing
Add a diff_free() function to free anything we may have allocated in
the "diff_options" struct, and the ability to make calling it a noop
by setting "no_free" in "diff_options".

This is required because when e.g. "git diff" is run we'll allocate
things in that struct, use the diff machinery once, and then exit.

But if we run e.g. "git log -p" we're going to re-use what we
allocated across multiple diff_flush() calls, and only want to free
things at the end.

We've thus ended up with features like the recently added "diff -I"[1]
where we'll leak memory. As it turns out it could have simply used the
pattern established in 6ea57703f6 (log: prepare log/log-tree to reuse
the diffopt.close_file attribute, 2016-06-22).

Manually adding more such flags to things log_tree_commit() every time
we need to allocate something would be tedious. Let's instead move
that fclose() code it to a new diff_free(), in anticipation of freeing
more things in that function in follow-up commits.

Some functions such as log_tree_commit() need an idiom of optionally
retaining a previous "no_free", as they may either free the memory
themselves, or their caller may do so. I'm keeping that idiom in
log_show_early() for good measure, even though I don't think it's
currently called in this manner. It also gets passed an existing
"struct rev_info", so future callers may want to set the "no_free"
flag.

This change is a bit hard to read because while the freeing pattern
we're introducing isn't unusual, the "file" member is a special
snowflake. We usually don't want to fclose() it. This is because
"file" is usually stdout, in which case we don't want to fclose()
it. We only want to opt-in to closing it when we e.g. open a file on
the filesystem. Thus the opt-in "close_file" flag.

So the API in general just needs a "no_free" flag to defer freeing,
but the "file" member still needs its "close_file" flag. This is made
more confusing because while refactoring this code we could replace
some "close_file=0" with "no_free=1", whereas others need to set both
flags.

This is because there were some cases where an existing "close_file=0"
meant "let's defer deallocation", and others where it meant "we don't
want to close this file handle at all".

1. 296d4a94e7 (diff: add -I<regex> that ignores matching changes,
   2020-10-20)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-11 09:21:05 -08:00
Junio C Hamano
0806279428 Merge branch 'sj/untracked-files-in-submodule-directory-is-not-dirty'
"git diff" showed a submodule working tree with untracked cruft as
"Submodule commit <objectname>-dirty", but a natural expectation is
that the "-dirty" indicator would align with "git describe --dirty",
which does not consider having untracked files in the working tree
as source of dirtiness.  The inconsistency has been fixed.

* sj/untracked-files-in-submodule-directory-is-not-dirty:
  diff: do not show submodule with untracked files as "-dirty"
2021-01-25 14:19:18 -08:00
Junio C Hamano
59fcf746f5 Merge branch 'jc/diff-I-status-fix'
"git diff -I<pattern> -exit-code" should exit with 0 status when
all the changes match the ignored pattern, but it didn't.

* jc/diff-I-status-fix:
  diff: correct interaction between --exit-code and -I<pattern>
2020-12-18 15:15:18 -08:00
Junio C Hamano
50f0439490 diff: correct interaction between --exit-code and -I<pattern>
Just like "git diff -w --exit-code" should exit with 0 when ignoring
whitespace differences results in no changes shown, if ignoring
certain changes with "git diff -I<pattern> --exit-code" result in an
empty patch, we should exit with 0.

The test suite did not cover the interaction between "--exit-code"
and "-w"; add one while adding a new test for "--exit-code" + "-I".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-16 17:33:26 -08:00
Sangeeta Jain
8ef9312464 diff: do not show submodule with untracked files as "-dirty"
Git diff reports a submodule directory as -dirty even when there are
only untracked files in the submodule directory. This is inconsistent
with what `git describe --dirty` says when run in the submodule
directory in that state.

Make `--ignore-submodules=untracked` the default for `git diff` when
there is no configuration variable or command line option, so that the
command would not give '-dirty' suffix to a submodule whose working
tree has untracked files, to make it consistent with `git
describe --dirty` that is run in the submodule working tree.

And also make `--ignore-submodules=none` the default for `git status`
so that the user doesn't end up deleting a submodule that has
uncommitted (untracked) files.

Signed-off-by: Sangeeta Jain <sangunb09@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08 14:27:35 -08:00
Junio C Hamano
bf0a430f70 Merge branch 'en/strmap'
A specialization of hashmap that uses a string as key has been
introduced.  Hopefully it will see wider use over time.

* en/strmap:
  shortlog: use strset from strmap.h
  Use new HASHMAP_INIT macro to simplify hashmap initialization
  strmap: take advantage of FLEXPTR_ALLOC_STR when relevant
  strmap: enable allocations to come from a mem_pool
  strmap: add a strset sub-type
  strmap: split create_entry() out of strmap_put()
  strmap: add functions facilitating use as a string->int map
  strmap: enable faster clearing and reusing of strmaps
  strmap: add more utility functions
  strmap: new utility functions
  hashmap: provide deallocation function names
  hashmap: introduce a new hashmap_partial_clear()
  hashmap: allow re-use after hashmap_free()
  hashmap: adjust spacing to fix argument alignment
  hashmap: add usage documentation explaining hashmap_free[_entries]()
2020-11-21 15:14:38 -08:00
Junio C Hamano
d5e35329dd Merge branch 'jk/diff-release-filespec-fix'
Running "git diff" while allowing external diff in a state with
unmerged paths used to segfault, which has been corrected.

* jk/diff-release-filespec-fix:
  t7800: simplify difftool test
  diff: allow passing NULL to diff_free_filespec_data()
2020-11-21 15:14:38 -08:00
Jinoh Kang
246959346f diff: allow passing NULL to diff_free_filespec_data()
Commit 3aef54e8b8 ("diff: munmap() file contents before running external
diff") introduced calls to diff_free_filespec_data in
run_external_diff, which may pass NULL pointers.

Fix this and prevent any such bugs in the future by making
`diff_free_filespec_data(NULL)` a no-op.

Fixes: 3aef54e8b8 ("diff: munmap() file contents before running external diff")
Signed-off-by: Jinoh Kang <luke1337@theori.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-06 11:37:07 -08:00
Junio C Hamano
1ae0949a03 Merge branch 'mk/diff-ignore-regex'
"git diff" family of commands learned the "-I<regex>" option to
ignore hunks whose changed lines all match the given pattern.

* mk/diff-ignore-regex:
  diff: add -I<regex> that ignores matching changes
  merge-base, xdiff: zero out xpparam_t structures
2020-11-02 13:17:44 -08:00
Elijah Newren
6da1a25814 hashmap: provide deallocation function names
hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed
for a while, but aren't necessarily the clearest names, especially with
hashmap_partial_clear() being added to the mix and lazy-initialization
now being supported.  Peff suggested we adopt the following names[1]:

  - hashmap_clear() - remove all entries and de-allocate any
    hashmap-specific data, but be ready for reuse

  - hashmap_clear_and_free() - ditto, but free the entries themselves

  - hashmap_partial_clear() - remove all entries but don't deallocate
    table

  - hashmap_partial_clear_and_free() - ditto, but free the entries

This patch provides the new names and converts all existing callers over
to the new naming scheme.

[1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-02 12:15:50 -08:00
Michał Kępień
296d4a94e7 diff: add -I<regex> that ignores matching changes
Add a new diff option that enables ignoring changes whose all lines
(changed, removed, and added) match a given regular expression.  This is
similar to the -I/--ignore-matching-lines option in standalone diff
utilities and can be used e.g. to ignore changes which only affect code
comments or to look for unrelated changes in commits containing a large
number of automatically applied modifications (e.g. a tree-wide string
replacement).  The difference between -G/-S and the new -I option is
that the latter filters output on a per-change basis.

Use the 'ignore' field of xdchange_t for marking a change as ignored or
not.  Since the same field is used by --ignore-blank-lines, identical
hunk emitting rules apply for --ignore-blank-lines and -I.  These two
options can also be used together in the same git invocation (they are
complementary to each other).

Rename xdl_mark_ignorable() to xdl_mark_ignorable_lines(), to indicate
that it is logically a "sibling" of xdl_mark_ignorable_regex() rather
than its "parent".

Signed-off-by: Michał Kępień <michal@isc.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-20 12:53:26 -07:00
Thomas Guyot-Sionnest
ff0c7fa8cb diff: fix modified lines stats with --stat and --numstat
Only skip diffstats when both oids are valid and identical. This check
was causing both false-positives (files included in diffstats with no
actual changes (0 lines modified) and false-negatives (showing 0 lines
modified in stats when files had actually changed).

Also replaced same_contents with may_differ to avoid confusion.

Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-24 12:31:45 -07:00
Junio C Hamano
9d4e7ec4d9 Merge branch 'jc/quote-path-cleanup'
"git status --short" quoted a path with SP in it when tracked, but
not those that are untracked, ignored or unmerged.  They are all
shown quoted consistently.

* jc/quote-path-cleanup:
  quote: turn 'nodq' parameter into a set of flags
  quote: rename misnamed sq_lookup[] to cq_lookup[]
  wt-status: consistently quote paths in "status --short" output
  quote_path: code clarification
  quote_path: optionally allow quoting a path with SP in it
  quote_path: give flags parameter to quote_path()
  quote_path: rename quote_path_relative() to quote_path()
2020-09-18 17:58:04 -07:00
Junio C Hamano
7c37c9750a quote: turn 'nodq' parameter into a set of flags
quote_c_style() and its friend quote_two_c_style() both take an
optional "please omit the double quotes around the quoted body"
parameter.  Turn it into a flag word, assign one bit out of it,
and call it CQUOTE_NODQ bit.

No behaviour change intended.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-10 13:08:07 -07:00
Junio C Hamano
bbdba3d883 Merge branch 'ss/submodule-summary-in-c'
Yet another subcommand of "git submodule" is getting rewritten in C.

* ss/submodule-summary-in-c:
  submodule: port submodule subcommand 'summary' from shell to C
  t7421: introduce a test script for verifying 'summary' output
  submodule: rename helper functions to avoid ambiguity
  submodule: remove extra line feeds between callback struct and macro
2020-09-09 13:53:05 -07:00
Junio C Hamano
b58e47a929 Merge branch 'mr/diff-hide-stat-wo-textual-change'
"git diff --stat -w" showed 0-line changes for paths whose changes
were only whitespaces, which was not intuitive.  We now omit such
paths from the stat output.

* mr/diff-hide-stat-wo-textual-change:
  diff: teach --stat to ignore uninteresting modifications
2020-09-03 12:37:05 -07:00
Junio C Hamano
096c948dab Merge branch 'dd/diff-customize-index-line-abbrev'
The output from the "diff" family of the commands had abbreviated
object names of blobs involved in the patch, but its length was not
affected by the --abbrev option.  Now it is.

* dd/diff-customize-index-line-abbrev:
  diff: index-line: respect --abbrev in object's name
  t4013: improve diff-post-processor logic
2020-08-31 15:49:46 -07:00
Junio C Hamano
51226147d1 Merge branch 'rs/patch-id-with-incomplete-line'
The patch-id computation did not ignore the "incomplete last line"
marker like whitespaces.

* rs/patch-id-with-incomplete-line:
  patch-id: ignore newline at end of file in diff_flush_patch_id()
2020-08-24 14:54:33 -07:00
Đoàn Trần Công Danh
3046c7f69a diff: index-line: respect --abbrev in object's name
A handful of Git's commands respect `--abbrev' for customizing length
of abbreviation of object names.

For diff-family, Git supports 2 different options for 2 different
purposes, `--full-index' for showing diff-patch object's name in full,
and `--abbrev' to customize the length of object names in diff-raw and
diff-tree header lines, without any options to customise the length of
object names in diff-patch format. When working with diff-patch format,
we only have two options, either full index, or default abbrev length.

Although, that behaviour is documented, it doesn't stop users from
trying to use `--abbrev' with the hope of customising diff-patch's
objects' name's abbreviation.

Let's allow the blob object names shown on the "index" line to be
abbreviated to arbitrary length given via the "--abbrev" option.

To preserve backward compatibility with old script that specify both
`--full-index' and `--abbrev', always show full object id
if `--full-index' is specified.

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-21 12:43:05 -07:00
Matthew Rogers
1cf3d5db9b diff: teach --stat to ignore uninteresting modifications
When options such as --ignore-space-change are in use, files with
modifications can have no interesting textual changes worth showing.  In
such cases, "git diff --stat" shows 0 lines of additions and deletions.
Teach "git diff --stat" not to show such a path in its output, which
would be more natural.

However, we don't want to prevent the display  of all files that have 0
effective diffs since they could be the result of a rename, permission
change, or other similar operation that may still be of interest so we
special case additions and deletions as they are always interesting.

Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-19 17:53:32 -07:00
René Scharfe
82a62015a7 patch-id: ignore newline at end of file in diff_flush_patch_id()
Whitespace is ignored when calculating patch IDs.  This is done by
removing all whitespace from diff lines before hashing them, including
a newline at the end of a file.  If that newline is missing, however,
diff reports that fact in a separate line containing "\ No newline at
end of file\n", and this marker is hashed like a context line.

This goes against our goal of making patch IDs independent of
whitespace.  Use the same heuristic that 2485eab55c (git-patch-id: do
not trip over "no newline" markers, 2011-02-17) added to git patch-id
instead and skip diff lines that start with a backslash and a space
and are longer than twelve characters.

Reported-by: Tilman Vogel <tilman.vogel@web.de>
Initial-test-by: Tilman Vogel <tilman.vogel@web.de>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-18 16:14:01 -07:00
Shourya Shukla
180b154b09 submodule: rename helper functions to avoid ambiguity
The helper functions: show_submodule_summary(),
prepare_submodule_summary() and print_submodule_summary() are used by
the builtin_diff() function in diff.c to generate a summary of
submodules in the context of a diff. Functions with similar names are to
be introduced in the upcoming port of submodule's summary subcommand.

So, rename the helper functions to '*_diff_submodule_summary()' to avoid
ambiguity.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-12 14:12:58 -07:00
Jeff King
d70a9eb611 strvec: rename struct fields
The "argc" and "argv" names made sense when the struct was argv_array,
but now they're just confusing. Let's rename them to "nr" (which we use
for counts elsewhere) and "v" (which is rather terse, but reads well
when combined with typical variable names like "args.v").

Note that we have to update all of the callers immediately. Playing
tricks with the preprocessor is hard here, because we wouldn't want to
rewrite unrelated tokens.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-30 19:18:06 -07:00
Jeff King
ef8d7ac42a strvec: convert more callers away from argv_array name
We eventually want to drop the argv_array name and just use strvec
consistently. There's no particular reason we have to do it all at once,
or care about interactions between converted and unconverted bits.
Because of our preprocessor compat layer, the names are interchangeable
to the compiler (so even a definition and declaration using different
names is OK).

This patch converts remaining files from the first half of the alphabet,
to keep the diff to a manageable size.

The conversion was done purely mechanically with:

  git ls-files '*.c' '*.h' |
  xargs perl -i -pe '
    s/ARGV_ARRAY/STRVEC/g;
    s/argv_array/strvec/g;
  '

and then selectively staging files with "git add '[abcdefghjkl]*'".
We'll deal with any indentation/style fallouts separately.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28 15:02:18 -07:00
Jeff King
dbbcd44fb4 strvec: rename files from argv-array to strvec
This requires updating #include lines across the code-base, but that's
all fairly mechanical, and was done with:

  git ls-files '*.c' '*.h' |
  xargs perl -i -pe 's/argv-array.h/strvec.h/'

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28 15:02:17 -07:00
Junio C Hamano
0cd0afc9c6 Merge branch 'jk/diff-memuse-optim-with-stat-unmatch'
Reduce memory usage during "diff --quiet" in a worktree with too
many stat-unmatched paths.

* jk/diff-memuse-optim-with-stat-unmatch:
  diff: discard blob data from stat-unmatched pairs
2020-06-17 21:54:00 -07:00
Jeff King
d2d7fbe129 diff: discard blob data from stat-unmatched pairs
When performing a tree-level diff against the working tree, we may find
that our index stat information is dirty, so we queue a filepair to be
examined later. If the actual content hasn't changed, we call this a
stat-unmatch; the stat information was out of date, but there's no
actual diff.  Normally diffcore_std() would detect and remove these
identical filepairs via diffcore_skip_stat_unmatch().  However, when
"--quiet" is used, we want to stop the diff as soon as we see any
changes, so we check for stat-unmatches immediately in diff_change().

That check may require us to actually load the file contents into the
pair of diff_filespecs. If we find that the pair isn't a stat-unmatch,
then no big deal; we'd likely load the contents later anyway to generate
a patch, do rename detection, etc, so we want to hold on to it. But if
it is a stat-unmatch, then we have no more use for that data; the whole
point is that we're going discard the pair. However, we never free the
allocated diff_filespec data.

In most cases, keeping that data isn't a problem. We don't expect a lot
of stat-unmatch entries, and since we're using --quiet, we'd quit as
soon as we saw such a real change anyway. However, there are extreme
cases where it makes a big difference:

  1. We'd generally mmap() the working tree half of the pair. And since
     the OS may limit the total number of maps, we can run afoul of this
     in large repositories. E.g.:

       $ cd linux
       $ git ls-files | wc -l
       67959
       $ sysctl vm.max_map_count
       vm.max_map_count = 65530
       $ git ls-files | xargs touch ;# everything is stat-dirty!
       $ git diff --quiet
       fatal: mmap failed: Cannot allocate memory

     It should be unusual to have so many files stat-dirty, but it's
     possible if you've just run a script like "sed -i" or similar.

     After this patch, the above correctly exits with code 0.

  2. Even if you don't hit mmap limits, the index half of the pair will
     have been pulled from the object database into heap memory. Again
     in a clone of linux.git, running:

       $ git ls-files | head -n 10000 | xargs touch
       $ git diff --quiet

     peaks at 145MB heap before this patch, and 94MB after.

This patch solves the problem by freeing any diff_filespec data we
picked up during the "--quiet" stat-unmatch check in diff_changes.
Nobody is going to need that data later, so there's no point holding on
to it. There are a few things to note:

  - we could skip queueing the pair entirely, which could in theory save
    a little work. But there's not much to save, as we need a
    diff_filepair to feed to diff_filespec_check_stat_unmatch() anyway.
    And since we cache the result of the stat-unmatch checks, a later
    call to diffcore_skip_stat_unmatch() call will quickly skip over
    them. The diffcore code also counts up the number of stat-unmatched
    pairs as it removes them. It's doubtful any callers would care about
    that in combination with --quiet, but we'd have to reimplement the
    logic here to be on the safe side. So it's not really worth the
    trouble.

  - I didn't write a test, because we always produce the correct output
    unless we run up against system mmap limits, which are both
    unportable and expensive to test against. Measuring peak heap
    would be interesting, but our perf suite isn't yet capable of that.

  - note that diff without "--quiet" does not suffer from the same
    problem. In diffcore_skip_stat_unmatch(), we detect the stat-unmatch
    entries and drop them immediately, so we're not carrying their data
    around.

  - you _can_ still trigger the mmap limit problem if you truly have
    that many files with actual changes. But it's rather unlikely. The
    stat-unmatch check avoids loading the file contents if the sizes
    don't match, so you'd need a pretty trivial change in every single
    file. Likewise, inexact rename detection might load the data for
    many files all at once. But you'd need not just 64k changes, but
    that many deletions and additions. The most likely candidate is
    perhaps break-detection, which would load the data for all pairs and
    keep it around for the content-level diff. But again, you'd need 64k
    actually changed files in the first place.

    So it's still possible to trigger this case, but it seems like "I
    accidentally made all my files stat-dirty" is the most likely case
    in the real world.

Reported-by: Jan Christoph Uhde <Jan@UhdeJc.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-02 09:28:56 -07:00
Laurent Arnoud
c28ded83fc diff: add config option relative
The `diff.relative` boolean option set to `true` shows only changes in
the current directory/value specified by the `path` argument of the
`relative` option and shows pathnames relative to the aforementioned
directory.

Teach `--no-relative` to override earlier `--relative`

Add for git-format-patch(1) options documentation `--relative` and
`--no-relative`

Signed-off-by: Laurent Arnoud <laurent@spkdev.net>
Acked-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-24 16:23:59 -07:00
Junio C Hamano
8f5dc5a4af Merge branch 'jt/avoid-prefetch-when-able-in-diff'
"git diff" in a partial clone learned to avoid lazy loading blob
objects in more casese when they are not needed.

* jt/avoid-prefetch-when-able-in-diff:
  diff: restrict when prefetching occurs
  diff: refactor object read
  diff: make diff_populate_filespec_options struct
  promisor-remote: accept 0 as oid_nr in function
2020-04-28 15:50:04 -07:00
Jonathan Tan
95acf11a3d diff: restrict when prefetching occurs
Commit 7fbbcb21b1 ("diff: batch fetching of missing blobs", 2019-04-08)
optimized "diff" by prefetching blobs in a partial clone, but there are
some cases wherein blobs do not need to be prefetched. In these cases,
any command that uses the diff machinery will unnecessarily fetch blobs.

diffcore_std() may read blobs when it calls the following functions:
 (1) diffcore_skip_stat_unmatch() (controlled by the config variable
     diff.autorefreshindex)
 (2) diffcore_break() and diffcore_merge_broken() (for break-rewrite
     detection)
 (3) diffcore_rename() (for rename detection)
 (4) diffcore_pickaxe() (for detecting addition/deletion of specified
     string)

Instead of always prefetching blobs, teach diffcore_skip_stat_unmatch(),
diffcore_break(), and diffcore_rename() to prefetch blobs upon the first
read of a missing object. This covers (1), (2), and (3): to cover the
rest, teach diffcore_std() to prefetch if the output type is one that
includes blob data (and hence blob data will be required later anyway),
or if it knows that (4) will be run.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-07 16:09:29 -07:00
Jonathan Tan
c14b6f83ec diff: refactor object read
Refactor the object reads in diff_populate_filespec() to have the first
object read not be in an if/else branch, because in a future patch, a
retry will be added to that first object read.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-07 16:09:29 -07:00
Jonathan Tan
1c37e86ab2 diff: make diff_populate_filespec_options struct
The behavior of diff_populate_filespec() currently can be customized
through a bitflag, but a subsequent patch requires it to support a
non-boolean option. Replace the bitflag with an options struct.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-07 16:09:29 -07:00
Jonathan Tan
db7ed7418b promisor-remote: accept 0 as oid_nr in function
There are 3 callers to promisor_remote_get_direct() that first check if
the number of objects to be fetched is equal to 0. Fold that check into
promisor_remote_get_direct(), and in doing so, be explicit as to what
promisor_remote_get_direct() does if oid_nr is 0 (it returns 0, success,
immediately).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-02 12:42:32 -07:00
brian m. carlson
c397aac02f convert: provide additional metadata to filters
Now that we have the codebase wired up to pass any additional metadata
to filters, let's collect the additional metadata that we'd like to
pass.

The two main places we pass this metadata are checkouts and archives.
In these two situations, reading HEAD isn't a valid option, since HEAD
isn't updated for checkouts until after the working tree is written and
archives can accept an arbitrary tree.  In other situations, HEAD will
usually reflect the refname of the branch in current use.

We pass a smaller amount of data in other cases, such as git cat-file,
where we can really only logically know about the blob.

This commit updates only the parts of the checkout code where we don't
use unpack_trees.  That function and callers of it will be handled in a
future commit.

In the archive code, we leak a small amount of memory, since nothing we
pass in the archiver argument structure is freed.

Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-16 11:37:02 -07:00
brian m. carlson
ab90ecae99 convert: permit passing additional metadata to filter processes
There are a variety of situations where a filter process can make use of
some additional metadata.  For example, some people find the ident
filter too limiting and would like to include the commit or the branch
in their smudged files.  This information isn't available during
checkout as HEAD hasn't been updated at that point, and it wouldn't be
available in archives either.

Let's add a way to pass this metadata down to the filter.  We pass the
blob we're operating on, the treeish (preferring the commit over the
tree if one exists), and the ref we're operating on.  Note that we won't
pass this information in all cases, such as when renormalizing or when
we're performing diffs, since it doesn't make sense in those cases.

The data we currently get from the filter process looks like the
following:

  command=smudge
  pathname=git.c
  0000

With this change, we'll get data more like this:

  command=smudge
  pathname=git.c
  refname=refs/tags/v2.25.1
  treeish=c522f061d551c9bb8684a7c3859b2ece4499b56b
  blob=7be7ad34bd053884ec48923706e70c81719a8660
  0000

There are a couple things to note about this approach.  For operations
like checkout, treeish will always be a commit, since we cannot check
out individual trees, but for other operations, like archive, we can end
up operating on only a particular tree, so we'll provide only a tree as
the treeish.  Similar comments apply for refname, since there are a
variety of cases in which we won't have a ref.

This commit wires up the code to print this information, but doesn't
pass any of it at this point.  In a future commit, we'll have various
code paths pass the actual useful data down.

Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-16 11:37:02 -07:00
Junio C Hamano
78e67cda42 Merge branch 'mt/use-passed-repo-more-in-funcs'
Some codepaths were given a repository instance as a parameter to
work in the repository, but passed the_repository instance to its
callees, which has been cleaned up (somewhat).

* mt/use-passed-repo-more-in-funcs:
  sha1-file: allow check_object_signature() to handle any repo
  sha1-file: pass git_hash_algo to hash_object_file()
  sha1-file: pass git_hash_algo to write_object_file_prepare()
  streaming: allow open_istream() to handle any repo
  pack-check: use given repo's hash_algo at verify_packfile()
  cache-tree: use given repo's hash_algo at verify_one()
  diff: make diff_populate_filespec() honor its repo argument
2020-02-14 12:54:22 -08:00
Jeff King
da8063522f diff: move diff.wsErrorHighlight to "basic" config
We parse diff.wsErrorHighlight in git_diff_ui_config(), meaning that it
doesn't take effect for plumbing commands, only for porcelains like
git-diff itself. This is mildly annoying as it means scripts like
add--interactive, which produce a user-visible diff with color, don't
respect the option.

We could teach that script to parse the config and pass it along as
--ws-error-highlight to the diff plumbing. But there's a simpler
solution.

It should be reasonably safe for plumbing to respect this option, as it
only kicks in when color is otherwise enabled. And anybody parsing
colorized output must already deal with the fact that color.diff.* may
change the exact output they see; those options have been part of
git_diff_basic_config() since its inception in 9a1805a872 (add a "basic"
diff config callback, 2008-01-04).

So we can just move it to the "basic" config, which fixes
add--interactive, along with any other script in the same boat, with a
very low risk of hurting any plumbing users.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31 12:01:19 -08:00
Matheus Tavares
eb999b3295 diff: make diff_populate_filespec() honor its repo argument
diff_populate_filespec() takes a struct repository argument but it
doesn't get passed down to read_object_file().

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31 10:45:39 -08:00
Junio C Hamano
f7998d9793 Merge branch 'js/builtin-add-i'
The beginning of rewriting "git add -i" in C.

* js/builtin-add-i:
  built-in add -i: implement the `help` command
  built-in add -i: use color in the main loop
  built-in add -i: support `?` (prompt help)
  built-in add -i: show unique prefixes of the commands
  built-in add -i: implement the main loop
  built-in add -i: color the header in the `status` command
  built-in add -i: implement the `status` command
  diff: export diffstat interface
  Start to implement a built-in version of `git add --interactive`
2019-12-05 12:52:43 -08:00
Daniel Ferreira
e4cb659ebd diff: export diffstat interface
Make the diffstat interface (namely, the diffstat_t struct and
compute_diffstat) no longer be internal to diff.c and allow it to be used
by other parts of git.

This is helpful for code that may want to easily extract information
from files using the diff machinery, while flushing it differently from
how the show_* functions used by diff_flush() do it. One example is the
builtin implementation of git-add--interactive's status.

Signed-off-by: Daniel Ferreira <bnmvco@gmail.com>
Signed-off-by: Slavica Đukić <slawica92@hotmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-14 11:10:04 +09:00
Elijah Newren
15beaaa3d1 Fix spelling errors in code comments
Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-10 16:00:54 +09:00
Junio C Hamano
5efabc7ed9 Merge branch 'ew/hashmap'
Code clean-up of the hashmap API, both users and implementation.

* ew/hashmap:
  hashmap_entry: remove first member requirement from docs
  hashmap: remove type arg from hashmap_{get,put,remove}_entry
  OFFSETOF_VAR macro to simplify hashmap iterators
  hashmap: introduce hashmap_free_entries
  hashmap: hashmap_{put,remove} return hashmap_entry *
  hashmap: use *_entry APIs for iteration
  hashmap_cmp_fn takes hashmap_entry params
  hashmap_get{,_from_hash} return "struct hashmap_entry *"
  hashmap: use *_entry APIs to wrap container_of
  hashmap_get_next returns "struct hashmap_entry *"
  introduce container_of macro
  hashmap_put takes "struct hashmap_entry *"
  hashmap_remove takes "const struct hashmap_entry *"
  hashmap_get takes "const struct hashmap_entry *"
  hashmap_add takes "struct hashmap_entry *"
  hashmap_get_next takes "const struct hashmap_entry *"
  hashmap_entry_init takes "struct hashmap_entry *"
  packfile: use hashmap_entry in delta_base_cache_entry
  coccicheck: detect hashmap_entry.hash assignment
  diff: use hashmap_entry_init on moved_entry.ent
2019-10-15 13:48:02 +09:00
Junio C Hamano
676278f8ea Merge branch 'bc/object-id-part17'
Preparation for SHA-256 upgrade continues.

* bc/object-id-part17: (26 commits)
  midx: switch to using the_hash_algo
  builtin/show-index: replace sha1_to_hex
  rerere: replace sha1_to_hex
  builtin/receive-pack: replace sha1_to_hex
  builtin/index-pack: replace sha1_to_hex
  packfile: replace sha1_to_hex
  wt-status: convert struct wt_status to object_id
  cache: remove null_sha1
  builtin/worktree: switch null_sha1 to null_oid
  builtin/repack: write object IDs of the proper length
  pack-write: use hash_to_hex when writing checksums
  sequencer: convert to use the_hash_algo
  bisect: switch to using the_hash_algo
  sha1-lookup: switch hard-coded constants to the_hash_algo
  config: use the_hash_algo in abbrev comparison
  combine-diff: replace GIT_SHA1_HEXSZ with the_hash_algo
  bundle: switch to use the_hash_algo
  connected: switch GIT_SHA1_HEXSZ to the_hash_algo
  show-index: switch hard-coded constants to the_hash_algo
  blame: remove needless comparison with GIT_SHA1_HEXSZ
  ...
2019-10-11 14:24:46 +09:00
Eric Wong
404ab78e39 hashmap: remove type arg from hashmap_{get,put,remove}_entry
Since these macros already take a `keyvar' pointer of a known type,
we can rely on OFFSETOF_VAR to get the correct offset without
relying on non-portable `__typeof__' and `offsetof'.

Argument order is also rearranged, so `keyvar' and `member' are
sequential as they are used as: `keyvar->member'

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:12 +09:00
Eric Wong
23dee69f53 OFFSETOF_VAR macro to simplify hashmap iterators
While we cannot rely on a `__typeof__' operator being portable
to use with `offsetof'; we can calculate the pointer offset
using an existing pointer and the address of a member using
pointer arithmetic for compilers without `__typeof__'.

This allows us to simplify usage of hashmap iterator macros
by not having to specify a type when a pointer of that type
is already given.

In the future, list iterator macros (e.g. list_for_each_entry)
may also be implemented using OFFSETOF_VAR to save hackers the
trouble of using container_of/list_entry macros and without
relying on non-portable `__typeof__'.

v3: use `__typeof__' to avoid clang warnings

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:11 +09:00
Eric Wong
c8e424c9c9 hashmap: introduce hashmap_free_entries
`hashmap_free_entries' behaves like `container_of' and passes
the offset of the hashmap_entry struct to the internal
`hashmap_free_' function, allowing the function to free any
struct pointer regardless of where the hashmap_entry field
is located.

`hashmap_free' no longer takes any arguments aside from
the hashmap itself.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:11 +09:00
Eric Wong
939af16eac hashmap_cmp_fn takes hashmap_entry params
Another step in eliminating the requirement of hashmap_entry
being the first member of a struct.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:11 +09:00
Eric Wong
f0e63c4113 hashmap: use *_entry APIs to wrap container_of
Using `container_of' can be verbose and choosing names for
intermediate "struct hashmap_entry" pointers is a hard problem.
So introduce "*_entry" APIs inspired by similar linked-list
APIs in the Linux kernel.

Unfortunately, `__typeof__' is not portable C, so we need an
extra parameter to specify the type.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:10 +09:00
Eric Wong
6bcbdfb277 hashmap_get_next returns "struct hashmap_entry *"
This is a step towards removing the requirement for
hashmap_entry being the first field of a struct.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:10 +09:00
Eric Wong
b6c5241606 hashmap_get takes "const struct hashmap_entry *"
This is less error-prone than "const void *" as the compiler
now detects invalid types being passed.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:10 +09:00
Eric Wong
b94e5c1df6 hashmap_add takes "struct hashmap_entry *"
This is less error-prone than "void *" as the compiler now
detects invalid types being passed.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:10 +09:00
Eric Wong
f6eb6bdcf2 hashmap_get_next takes "const struct hashmap_entry *"
This is less error-prone than "const void *" as the compiler
now detects invalid types being passed.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:10 +09:00
Eric Wong
e010a41216 diff: use hashmap_entry_init on moved_entry.ent
Otherwise, the hashmap_entry.next field appears to remain
uninitialized, which can lead to problems when
add_lines_to_move_detection calls hashmap_add.

I found this through manual inspection when converting
hashmap_add callers to take "struct hashmap_entry *".

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:09 +09:00
Junio C Hamano
b9ac6c59b8 Merge branch 'cc/multi-promisor'
Teach the lazy clone machinery that there can be more than one
promisor remote and consult them in order when downloading missing
objects on demand.

* cc/multi-promisor:
  Move core_partial_clone_filter_default to promisor-remote.c
  Move repository_format_partial_clone to promisor-remote.c
  Remove fetch-object.{c,h} in favor of promisor-remote.{c,h}
  remote: add promisor and partial clone config to the doc
  partial-clone: add multiple remotes in the doc
  t0410: test fetching from many promisor remotes
  builtin/fetch: remove unique promisor remote limitation
  promisor-remote: parse remote.*.partialclonefilter
  Use promisor_remote_get_direct() and has_promisor_remote()
  promisor-remote: use repository_format_partial_clone
  promisor-remote: add promisor_remote_reinit()
  promisor-remote: implement promisor_remote_get_direct()
  Add initial support for many promisor remotes
  fetch-object: make functions return an error code
  t0410: remove pipes after git commands
2019-09-18 11:50:09 -07:00
Junio C Hamano
d8b1ce7972 Merge branch 'jt/diff-lazy-fetch-submodule-fix'
On-demand object fetching in lazy clone incorrectly tried to fetch
commits from submodule projects, while still working in the
superproject, which has been corrected.

* jt/diff-lazy-fetch-submodule-fix:
  diff: skip GITLINK when lazy fetching missing objs
2019-09-09 12:26:38 -07:00
Jonathan Tan
a63694f523 diff: skip GITLINK when lazy fetching missing objs
In 7fbbcb21b1 ("diff: batch fetching of missing blobs", 2019-04-08),
diff was taught to batch the fetching of missing objects when operating
on a partial clone, but was not taught to refrain from fetching
GITLINKs. Teach diff to check if an object is a GITLINK before including
it in the set to be fetched.

(As stated in the commit message of that commit, unpack-trees was also
taught a similar thing prior, but unpack-trees correctly checks for
GITLINK before including objects in the set to be fetched.)

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-20 15:04:26 -07:00
brian m. carlson
36261e42ec patch-id: convert to use the_hash_algo
Convert the two separate patch-id implementations to use the_hash_algo
in their implementation.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-19 15:04:57 -07:00
Junio C Hamano
58f6cfd8ce Merge branch 'js/unmap-before-ext-diff' into maint
Windows update.

* js/unmap-before-ext-diff:
  diff: munmap() file contents before running external diff
2019-07-29 12:38:11 -07:00
Junio C Hamano
d9beb468e6 Merge branch 'js/unmap-before-ext-diff'
Windows update.

* js/unmap-before-ext-diff:
  diff: munmap() file contents before running external diff
2019-07-25 13:59:21 -07:00
Thomas Gummerer
430be36eb5 range-diff: suppress line count in outer diff
The line count in the outer diff's hunk headers of a range diff is not
all that interesting.  It merely shows how far along the inner diff
are on both sides.  That number is of no use for human readers, and
range-diffs are not meant to be machine readable.

In a subsequent commit we're going to add some more contextual
information such as the filename corresponding to the diff to the hunk
headers.  Remove the unnecessary information, and just keep the "@@"
to indicate that a new hunk of the outer diff is starting.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-11 14:29:27 -07:00
Johannes Schindelin
3aef54e8b8 diff: munmap() file contents before running external diff
When running an external diff from, say, a diff tool, it is safe to
assume that we want to write the files in question. On Windows, that
means that there cannot be any other process holding an open handle to
said files, or even just a mapped region.

So let's make sure that `git diff` itself is not holding any open handle
to the files in question.

In fact, we will just release the file pair right away, as the external
diff uses the files we just wrote, so we do not need to hold the file
contents in memory anymore.

This fixes https://github.com/git-for-windows/git/issues/1315

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-11 12:11:54 -07:00
Christian Couder
b14ed5adaf Use promisor_remote_get_direct() and has_promisor_remote()
Instead of using the repository_format_partial_clone global
and fetch_objects() directly, let's use has_promisor_remote()
and promisor_remote_get_direct().

This way all the configured promisor remotes will be taken
into account, not only the one specified by
extensions.partialClone.

Also when cloning or fetching using a partial clone filter,
remote.origin.promisor will be set to "true" instead of
setting extensions.partialClone to "origin". This makes it
possible to use many promisor remote just by fetching from
them.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-25 14:05:37 -07:00
Junio C Hamano
8202d12fca Merge branch 'sb/format-patch-base-patch-id-fix'
The "--base" option of "format-patch" computed the patch-ids for
prerequisite patches in an unstable way, which has been updated to
compute in a way that is compatible with "git patch-id --stable".

* sb/format-patch-base-patch-id-fix:
  format-patch: make --base patch-id output stable
  format-patch: inform user that patch-id generation is unstable
2019-06-13 13:18:46 -07:00
Jiang Xin
8a1569d655 i18n: fix typos found during l10n for git 2.22.0
Fix two typos introduced by the following commits:

+ 31fba9d3b4 (diff-parseopt: convert --[src|dst]-prefix, 2019-03-24)
+ ed8b4132c8 (remote-curl: mark all error messages for translation,
  2019-03-05)

Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-03 11:10:53 -07:00
Junio C Hamano
936dce6f93 Merge branch 'ja/diff-opt-typofix'
Typofix.

* ja/diff-opt-typofix:
  diff: fix mistake in translatable strings
2019-05-30 10:50:46 -07:00
Junio C Hamano
20aa7c594f Merge branch 'nd/diff-parseopt'
A brown-paper-bag bugfix to a change already in 'master'.

* nd/diff-parseopt:
  parse-options: check empty value in OPT_INTEGER and OPT_ABBREV
  diff-parseopt: restore -U (no argument) behavior
  diff-parseopt: correct variable types that are used by parseopt
2019-05-30 10:50:44 -07:00
Nguyễn Thái Ngọc Duy
8ef05193bc diff-parseopt: restore -U (no argument) behavior
Before d473e2e0e8 (diff.c: convert -U|--unified, 2019-01-27), -U and
--unified are implemented with a custom parser opt_arg() in diff.c. I
didn't check this code carefully and not realize that it's the
equivalent of PARSE_OPT_NONEG | PARSE_OPT_OPTARG.

In other words, if -U is specified without any argument, the option
should be accepted, and the default value should be used. Without
PARSE_OPT_OPTARG, parse_options() will reject this case and cause a
regression.

Reported-by: Bryan Turner <bturner@atlassian.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-29 11:04:32 -07:00
Jean-Noël Avila
8bcd8f4cea diff: fix mistake in translatable strings
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-19 11:01:02 +09:00
Stephen Boyd
a8f6855f48 format-patch: make --base patch-id output stable
We weren't flushing the context each time we processed a hunk in the
patch-id generation code in diff.c, but we were doing that when we
generated "stable" patch-ids with the 'patch-id' tool. Let's port that
similar logic over from patch-id.c into diff.c so we can get the same
hash when we're generating patch-ids for 'format-patch --base=' types of
command invocations.

Cc: Xiaolong Ye <xiaolong.ye@intel.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-08 19:27:43 +09:00
Junio C Hamano
32dc15dec1 Merge branch 'jt/batch-fetch-blobs-in-diff'
While running "git diff" in a lazy clone, we can upfront know which
missing blobs we will need, instead of waiting for the on-demand
machinery to discover them one by one.  Aim to achieve better
performance by batching the request for these promised blobs.

* jt/batch-fetch-blobs-in-diff:
  diff: batch fetching of missing blobs
  sha1-file: support OBJECT_INFO_FOR_PREFETCH
2019-04-25 16:41:19 +09:00
Junio C Hamano
dcd6a8c09a Merge branch 'nd/diff-parseopt-4'
Fourth batch to teach the diff machinery to use the parse-options
API.

* nd/diff-parseopt-4:
  am: avoid diff_opt_parse()
  diff --no-index: use parse_options() instead of diff_opt_parse()
  range-diff: use parse_options() instead of diff_opt_parse()
  diff.c: allow --no-color-moved-ws
  diff-parseopt: convert --color-moved-ws
  diff-parseopt: convert --[no-]color-moved
  diff-parseopt: convert --inter-hunk-context
  diff-parseopt: convert --no-prefix
  diff-parseopt: convert --line-prefix
  diff-parseopt: convert --[src|dst]-prefix
  diff-parseopt: convert --[no-]abbrev
  diff-parseopt: convert --diff-filter
  diff-parseopt: convert --find-object
  diff-parseopt: convert -O
  diff-parseopt: convert --pickaxe-all|--pickaxe-regex
  diff-parseopt: convert -S|-G
  diff-parseopt: convert -l
  diff-parseopt: convert -z
  diff-parseopt: convert --ita-[in]visible-in-index
  diff-parseopt: convert --ws-error-highlight
2019-04-25 16:41:12 +09:00
Junio C Hamano
8a9a837a63 Merge branch 'nd/diff-parseopt-3'
Third batch to teach the diff machinery to use the parse-options
API.

* nd/diff-parseopt-3:
  diff-parseopt: convert --submodule
  diff-parseopt: convert --ignore-submodules
  diff-parseopt: convert --textconv
  diff-parseopt: convert --ext-diff
  diff-parseopt: convert --quiet
  diff-parseopt: convert --exit-code
  diff-parseopt: convert --color-words
  diff-parseopt: convert --word-diff-regex
  diff-parseopt: convert --word-diff
  diff-parseopt: convert --[no-]color
  diff-parseopt: convert --[no-]follow
  diff-parseopt: convert -R
  diff-parseopt: convert -a|--text
  diff-parseopt: convert --full-index
  diff-parseopt: convert --binary
  diff-parseopt: convert --anchored
  diff-parseopt: convert --diff-algorithm
  diff-parseopt: convert --histogram
  diff-parseopt: convert --patience
  diff-parseopt: convert --[no-]indent-heuristic
2019-04-16 19:28:03 +09:00
Jonathan Tan
7fbbcb21b1 diff: batch fetching of missing blobs
When running a command like "git show" or "git diff" in a partial clone,
batch all missing blobs to be fetched as one request.

This is similar to c0c578b33c ("unpack-trees: batch fetching of missing
blobs", 2017-12-08), but for another command.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-08 14:04:50 +09:00
Nguyễn Thái Ngọc Duy
cdb5330a9b am: avoid diff_opt_parse()
diff_opt_parse() is a heavy hammer to just set diff filter. But it's
the only way because of the diff_status_letters[] mapping. Add a new
API to set diff filter and use it in git-am. diff_opt_parse()'s only
remaining call site in revision.c will be gone soon and having it here
just because of git-am does not make sense.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24 22:21:24 +09:00
Nguyễn Thái Ngọc Duy
bb9872904e diff.c: allow --no-color-moved-ws
This option is added in commit b73bcbac4a (diff: allow
--no-color-moved-ws - 2018-11-23) in pw/diff-color-moved-ws-fix. To ease
merge conflict resolution, re-implement the option handling here so that
the conflict could be resolved by taking this side of change.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24 22:21:23 +09:00
Nguyễn Thái Ngọc Duy
8ce2020ff0 diff-parseopt: convert --color-moved-ws
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24 22:21:23 +09:00
Nguyễn Thái Ngọc Duy
59311a9820 diff-parseopt: convert --[no-]color-moved
Mark one more string for translation while at there

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24 22:21:23 +09:00
Nguyễn Thái Ngọc Duy
16ed6c97cc diff-parseopt: convert --inter-hunk-context
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24 22:21:23 +09:00
Nguyễn Thái Ngọc Duy
11c659d890 diff-parseopt: convert --no-prefix
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24 22:21:23 +09:00
Nguyễn Thái Ngọc Duy
2f81cf9895 diff-parseopt: convert --line-prefix
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24 22:21:23 +09:00
Nguyễn Thái Ngọc Duy
31fba9d3b4 diff-parseopt: convert --[src|dst]-prefix
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24 22:21:23 +09:00
Nguyễn Thái Ngọc Duy
d877418390 diff-parseopt: convert --[no-]abbrev
OPT__ABBREV() has the same behavior as the deleted code with one
difference: it does check for valid number and error out if not. And the
'40' change is self explanatory.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24 22:21:22 +09:00
Nguyễn Thái Ngọc Duy
d2d3f27300 diff-parseopt: convert --diff-filter
While at it, mark one more string for translation

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24 22:21:22 +09:00
Nguyễn Thái Ngọc Duy
a75f28cbda diff-parseopt: convert --find-object
While at it, mark one more string for translation.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24 22:21:22 +09:00
Nguyễn Thái Ngọc Duy
f731814b3a diff-parseopt: convert -O
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24 22:21:22 +09:00
Nguyễn Thái Ngọc Duy
85f8e889ea diff-parseopt: convert --pickaxe-all|--pickaxe-regex
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24 22:21:22 +09:00
Nguyễn Thái Ngọc Duy
a41cfb3203 diff-parseopt: convert -S|-G
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24 22:21:22 +09:00
Nguyễn Thái Ngọc Duy
bffee749a8 diff-parseopt: convert -l
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24 22:21:22 +09:00