Remove the dummy discard_hunk_line() function added in
3b40a090fd (diff: avoid generating unused hunk header lines,
2018-11-02) in favor of having a new XDL_EMIT_NO_HUNK_HDR flag, for
use along with the two existing and similar XDL_EMIT_* flags.
Unlike the recently amended xdiff_emit_line_fn interface which'll be
called in a loop in xdl_emit_diff(), the hunk header is only emitted
once.
It makes more sense to pass this as a flag than provide a dummy
callback because that function may be able to skip doing certain work
if it knows the caller is doing nothing with the hunk header.
It would be possible to do so in the case of -U0 now, but the benefit
of doing so is so small that I haven't bothered. But this leaves the
door open to that, and more importantly makes the API use more
intuitive.
The reason we're putting a flag in the gap between 1<<0 and 1<<2 is
that the old 1<<1 flag was removed in 907681e940 (xdiff: drop
XDL_EMIT_COMMON, 2016-02-23) without re-ordering the remaining flags.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Solve a long-standing item for "git log -Grx" of us e.g. finding "+
str" in the diff context and noting that we had a "hit", but xdiff
diligently continuing to generate and spew the rest of the diff at
us. This makes use of a new "early return" xdiff interface added by
preceding commits.
The TODO item (or, the NEEDSWORK comment) has been there since "git
log -G" was implemented. See f506b8e8b5 (git log/diff: add -G<regexp>
that greps in the patch text, 2010-08-23).
But now with the support added in the preceding changes to the
xdiff-interface we can return early. Let's assert the behavior of that
new early-return xdiff-interface by having a BUG() call here to die if
it ever starts handing us needless work again.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Finish the change started in the preceding commit and allow an early
return from "xdiff_emit_line_fn" callbacks, this will allows
diffcore-pickaxe.c to save itself redundant work.
Our xdiff interface also had the limitation of not being able to abort
early since the beginning, see d9ea73e056 (combine-diff: refactor
built-in xdiff interface., 2006-04-05). Although at that time
"xdiff_emit_line_fn" was called "xdiff_emit_consume_fn", and
"xdiff_emit_hunk_fn" didn't exist yet.
There was some work in this area of xdiff-interface.[ch] recently with
3b40a090fd (diff: avoid generating unused hunk header lines,
2018-11-02) and 7c61e25fbf (diff: use hunk callback for word-diff,
2018-11-02).
In combination those two changes allow us to not do any work on the
hunks and diff at all, but didn't change the status quo with regards
to consumers that e.g. want the diff lines, but might want to abort
early.
Whereas now we can abort e.g. on the first "-line" of a 1000 line diff
if that's all we needed.
This interface is rather scary as noted in the comment to
xdiff-interface.h being added here, as noted there a future change
could add more exit codes, and hack xdl_emit_diff() and friends to
ignore or skip things more selectively as a result.
I did not see an inherent reason for why xdl_emit_{diffrec,record}()
could not be changed to ferry the "xdiff_emit_line_fn" error code
upwards instead of returning -1 on all "ret < 0".
But doing so would require corresponding changes in xdl_emit_diff(),
xdl_diff(). I didn't see any issue with narrowly doing that to
accomplish what I needed here, but it would leave xdiff's own return
values in an inconsistent state.
Instead I've left it at returning a more conventional (for git's own
codebase) 1 for an early return, and translating it (or rather, all
non-zero) to -1 for xdiff's consumption.
The reason for most of the "stop" complexity in xdiff_outf() is
because we want to be able to abort early, but do so in a way that
doesn't skip the appropriate strbuf_reset() invocations.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the function prototype of xdiff_emit_line_fn to return an "int"
instead of "void". Change all of those functions to "return 0",
nothing checks those return values yet, and no behavior is being
changed.
In subsequent commits the interface will be changed to allow early
return via this new return value.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In previous patches, extern was mechanically removed from function
declarations without care to formatting, causing parameter lists to be
misaligned. Manually format changed sections such that the parameter
lists should be realigned.
Viewing this patch with 'git diff -w' should produce no output.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There has been a push to remove extern from function declarations.
Remove some instances of "extern" for function declarations which are
caught by Coccinelle. Note that Coccinelle has some difficulty with
processing functions with `__attribute__` or varargs so some `extern`
declarations are left behind to be dealt with in a future patch.
This was the Coccinelle patch used:
@@
type T;
identifier f;
@@
- extern
T f(...);
and it was run with:
$ git ls-files \*.{c,h} |
grep -v ^compat/ |
xargs spatch --sp-file contrib/coccinelle/noextern.cocci --in-place
Files under `compat/` are intentionally excluded as some are directly
copied from external sources and we should avoid churning them as much
as possible.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function was used only for parsing the hunk headers generated by
xdiff. Now that we can use hunk callbacks to get that information
directly, it has outlived its usefulness.
Note to anyone who wants to resurrect it: the "len" parameter was
totally unused, meaning that the function could read past the end of the
"line" array. In practice this never happened, because we only used it
to parse xdiff's generated header lines. But it would be dangerous to
use it for other cases without fixing this defect.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some callers of xdi_diff_outf() do not look at the generated hunk header
lines at all. By plugging in a no-op hunk callback, this tells xdiff not
to even bother formatting them.
This patch introduces a stock no-op callback and uses it with a few
callers whose line callbacks explicitly ignore hunk headers (because
they look only for +/- lines).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit taught xdiff to optionally provide the hunk header
data to a specialized callback. But most users of xdiff actually use our
more convenient xdi_diff_outf() helper, which ensures that our callbacks
are always fed whole lines.
Let's plumb the special hunk-callback through this interface, too. It
will follow the same rule as xdiff when the hunk callback is NULL (i.e.,
continue to pass a stringified hunk header to the line callback). Since
we add NULL to each caller, there should be no behavior change yet.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will turn out to be useful in a later patch.
xdl_recmatch is exported in xdiff/xutils.h, to be used by various
xdiff/*.c files, but not outside of xdiff/. This one makes it available
to the outside, too.
While at it, add documentation.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since all of its callers have been updated, convert read_mmblob to take
a pointer to struct object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The xdiff code is not prepared to handle extremely large
files. It uses "int" in many places, which can overflow if
we have a very large number of lines or even bytes in our
input files. This can cause us to produce incorrect diffs,
with no indication that the output is wrong. Or worse, we
may even underallocate a buffer whose size is the result of
an overflowing addition.
We're much better off to tell the user that we cannot diff
or merge such a large file. This patch covers both cases,
but in slightly different ways:
1. For merging, we notice the large file and cleanly fall
back to a binary merge (which is effectively "we cannot
merge this").
2. For diffing, we make the binary/text distinction much
earlier, and in many different places. For this case,
we'll use the xdi_diff as our choke point, and reject
any diff there before it hits the xdiff code.
This means in most cases we'll die() immediately after.
That's not ideal, but in practice we shouldn't
generally hit this code path unless the user is trying
to do something tricky. We already consider files
larger than core.bigfilethreshold to be binary, so this
code would only kick in when that is circumvented
(either by bumping that value, or by using a
.gitattribute to mark a file as diffable).
In other words, we can avoid being "nice" here, because
there is already nice code that tries to do the right
thing. We are adding the suspenders to the nice code's
belt, so notice when it has been worked around (both to
protect the user from malicious inputs, and because it
is better to die() than generate bogus output).
The maximum size was chosen after experimenting with feeding
large files to the xdiff code. It's just under a gigabyte,
which leaves room for two obvious cases:
- a diff3 merge conflict result on files of maximum size X
could be 3*X plus the size of the markers, which would
still be only about 3G, which fits in a 32-bit int.
- some of the diff code allocates arrays of one int per
record. Even if each file consists only of blank lines,
then a file smaller than 1G will have fewer than 1G
records, and therefore the int array will fit in 4G.
Since the limit is arbitrary anyway, I chose to go under a
gigabyte, to leave a safety margin (e.g., we would not want
to overflow by allocating "(records + 1) * sizeof(int)" or
similar.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xdi_diff_outf() overrides the structure members of its last parameter,
ignoring any value that callers pass in. It's no surprise then that all
callers pass a pointer to an uninitialized structure. They also don't
read it after the call, so the parameter is neither used for input nor
for output. Turn it into a local variable of xdi_diff_outf().
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The following function is duplicated:
fill_mm
Move it to xdiff-interface.c and rename it 'read_mmblob', as suggested
by Junio C Hamano.
Also, change parameters order for consistency with read_mmfile().
Signed-off-by: Michael Lukashov <michael.lukashov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xdiff_set_find_func() is used to set user defined regular expressions
for finding function signatures. Add xdiff_clear_find_func(), which
frees the memory allocated by the former, making the API complete.
Also, use the new function in diff.c (the only call site of
xdiff_set_find_func()) to clean up after ourselves.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Based on a patch by Brian Downing, this uses the xdiff emit_func feature
to implement xdi_diff_hunks(). It's a function that calls a callback for
each hunk of a diff, passing its lengths.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* bc/master-diff-hunk-header-fix:
Clarify commit error message for unmerged files
Use strchrnul() instead of strchr() plus manual workaround
Use remove_path from dir.c instead of own implementation
Add remove_path: a function to remove as much as possible of a path
git-submodule: Fix "Unable to checkout" for the initial 'update'
Clarify how the user can satisfy stash's 'dirty state' check.
t4018-diff-funcname: test syntax of builtin xfuncname patterns
t4018-diff-funcname: test syntax of builtin xfuncname patterns
make "git remote" report multiple URLs
diff hunk pattern: fix misconverted "\{" tex macro introducers
diff: fix "multiple regexp" semantics to find hunk header comment
diff: use extended regexp to find hunk headers
diff: use extended regexp to find hunk headers
diff.*.xfuncname which uses "extended" regex's for hunk header selection
diff.c: associate a flag with each pattern and use it for compiling regex
diff.c: return pattern entry pointer rather than just the hunk header pattern
Conflicts:
builtin-merge-recursive.c
t/t7201-co.sh
xdiff-interface.h
* jc/better-conflict-resolution:
Fix AsciiDoc errors in merge documentation
git-merge documentation: describe how conflict is presented
checkout --conflict=<style>: recreate merge in a non-default style
checkout -m: recreate merge when checking out of unmerged index
git-merge-recursive: learn to honor merge.conflictstyle
merge.conflictstyle: choose between "merge" and "diff3 -m" styles
rerere: understand "diff3 -m" style conflicts with the original
rerere.c: use symbolic constants to keep track of parsing states
xmerge.c: "diff3 -m" style clips merge reduction level to EAGER or less
xmerge.c: minimum readability fixups
xdiff-merge: optionally show conflicts in "diff3 -m" style
xdl_fill_merge_buffer(): separate out a too deeply nested function
checkout --ours/--theirs: allow checking out one side of a conflicting merge
checkout -f: allow ignoring unmerged paths when checking out of the index
Conflicts:
Documentation/git-checkout.txt
builtin-checkout.c
builtin-merge-recursive.c
t/t7201-co.sh
* bc/maint-diff-hunk-header-fix:
diff.*.xfuncname which uses "extended" regex's for hunk header selection
diff.c: associate a flag with each pattern and use it for compiling regex
diff.c: return pattern entry pointer rather than just the hunk header pattern
Cosmetical command name fix
Start conforming code to "git subcmd" style part 3
t9700/test.pl: remove File::Temp requirement
t9700/test.pl: avoid bareword 'STDERR' in 3-argument open()
GIT 1.6.0.2
Fix some manual typos.
Use compatibility regex library also on FreeBSD
Use compatibility regex library also on AIX
Update draft release notes for 1.6.0.2
Use compatibility regex library for OSX/Darwin
git-svn: Fixes my() parameter list syntax error in pre-5.8 Perl
Git.pm: Use File::Temp->tempfile instead of ->new
t7501: always use test_cmp instead of diff
Start conforming code to "git subcmd" style part 2
diff: Help "less" hide ^M from the output
checkout: do not check out unmerged higher stages randomly
Conflicts:
Documentation/git.txt
Documentation/gitattributes.txt
Makefile
diff.c
t/t7201-co.sh
This is in preparation for allowing extended regular expression patterns.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This teaches "git merge-file" to honor merge.conflictstyle configuration
variable, whose value can be "merge" (default) or "diff3".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This further enhances xdi_diff_outf() interface so that it takes two
common parameters: the callback function that processes one line at a
time, and a pointer to its application specific callback data structure.
xdi_diff_outf() creates its own "xdiff_emit_state" structure and stashes
these two away inside it, which is used by the lowest level output
function in the xdiff_outf() callchain, consume_one(), to call back to the
application layer. With this restructuring, we lift the requirement that
the caller supplied callback data structure embeds xdiff_emit_state
structure as its first member.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Continually xreallocing and freeing the remainder member of struct
xdiff_emit_state was a noticeable performance hit. Use a strbuf
instead.
This yields a decent performance improvement on "git blame" on certain
repositories. For example, before this commit:
$ time git blame -M -C -C -p --incremental server.c >/dev/null
101.52user 0.17system 1:41.73elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+39561minor)pagefaults 0swaps
With this commit:
$ time git blame -M -C -C -p --incremental server.c >/dev/null
80.38user 0.30system 1:20.81elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+50979minor)pagefaults 0swaps
Signed-off-by: Brian Downing <bdowning@lavos.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To prepare for the need to initialize and release resources for an
xdi_diff with the xdiff_outf output function, make a new function to
wrap this usage.
Old:
ecb.outf = xdiff_outf;
ecb.priv = &state;
...
xdi_diff(file_p, file_o, &xpp, &xecfg, &ecb);
New:
xdi_diff_outf(file_p, file_o, &state.xm, &xpp, &xecfg, &ecb);
Signed-off-by: Brian Downing <bdowning@lavos.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This inserts a new function xdi_diff() that currently does not
do anything other than calling the underlying xdl_diff() to the
callchain of current callers of xdl_diff() function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes"diff -p" hunk headers customizable via gitattributes mechanism.
It is based on Johannes's earlier patch that allowed to define a single
regexp to be used for everything.
The mechanism to arrive at the regexp that is used to define hunk header
is the same as other use of gitattributes. You assign an attribute, funcname
(because "diff -p" typically uses the name of the function the patch is about
as the hunk header), a simple string value. This can be one of the names of
built-in pattern (currently, "java" is defined) or a custom pattern name, to
be looked up from the configuration file.
(in .gitattributes)
*.java funcname=java
*.perl funcname=perl
(in .git/config)
[funcname]
java = ... # ugly and complicated regexp to override the built-in one.
perl = ... # another ugly and complicated regexp to define a new one.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already have two instances where we want to determine if a buffer
contains binary data as opposed to text.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
read_file() was a useful function if you want to work with the xdiff stuff,
so it was renamed and put into a more central place.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This refactors the line-by-line callback mechanism used in
combine-diff so that other programs can reuse it more easily.
Signed-off-by: Junio C Hamano <junkio@cox.net>