Commit graph

129 commits

Author SHA1 Message Date
Stefan Beller
5e4e5bb539 xdiff: remove unneeded declarations
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 09:26:42 -07:00
Junio C Hamano
a52fb9b8f3 Merge branch 'js/ignore-space-at-eol' into maint
An age old bug that caused "git diff --ignore-space-at-eol"
misbehave has been fixed.

* js/ignore-space-at-eol:
  diff: fix a double off-by-one with --ignore-space-at-eol
  diff: demonstrate a bug with --patience and --ignore-space-at-eol
2016-08-08 14:21:35 -07:00
Johannes Schindelin
044fb190f7 diff: fix a double off-by-one with --ignore-space-at-eol
When comparing two lines, ignoring any whitespace at the end, we first
try to match as many bytes as possible and break out of the loop only
upon mismatch, to let the remainder be handled by the code shared with
the other whitespace-ignoring code paths.

When comparing the bytes, however, we incremented the counters always,
even if the bytes did not match. And because we fall through to  the
space-at-eol handling at that point, it is as if that mismatch never
happened.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-11 11:55:53 -07:00
Junio C Hamano
fda65fadb6 Merge branch 'rs/xdiff-hunk-with-func-line' into maint
"git show -W" (extend hunks to cover the entire function, delimited
by lines that match the "funcname" pattern) used to show the entire
file when a change added an entire function at the end of the file,
which has been fixed.

* rs/xdiff-hunk-with-func-line:
  xdiff: fix merging of appended hunk with -W
  grep: -W: don't extend context to trailing empty lines
  t7810: add test for grep -W and trailing empty context lines
  xdiff: don't trim common tail with -W
  xdiff: -W: don't include common trailing empty lines in context
  xdiff: ignore empty lines before added functions with -W
  xdiff: handle appended chunks better with -W
  xdiff: factor out match_func_rec()
  t4051: rewrite, add more tests
2016-06-27 09:56:24 -07:00
René Scharfe
6f8d9bccb2 xdiff: fix merging of appended hunk with -W
When -W is given we search the lines between the end of the current
context and the next change for a function line.  If there is none then
we merge those two hunks as they must be part of the same function.

If the next change is an appended chunk we abort the search early in
get_func_line(), however, because its line number is out of range.  Fix
that by searching from the end of the pre-image in that case instead.

Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-09 15:27:26 -07:00
René Scharfe
9e6a4cfc38 xdiff: -W: don't include common trailing empty lines in context
Empty lines between functions are shown by diff -W, as it considers them
to be part of the function preceding them.  They are not interesting in
most languages.  The previous patch stopped showing them in the special
case of a function added at the end of a file.

Stop extending context to those empty lines by skipping back over them
from the start of the next function.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-31 13:08:56 -07:00
René Scharfe
392f6d3166 xdiff: ignore empty lines before added functions with -W
If a new function and a preceding empty line is appended, diff -W shows
the previous function in full in order to provide context for that empty
line.  In most languages empty lines between sections are not
interesting in and off themselves and showing a whole extra function for
them is not what we want.

Skip empty lines when checking of the appended chunk starts with a
function line, thereby avoiding to extend the context just for them.

Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-31 13:08:56 -07:00
René Scharfe
6d5badb238 xdiff: handle appended chunks better with -W
If lines are added at the end of a file, diff -W shows the whole file.
That's because get_func_line() only considers the pre-image and gives up
if it sees a record index beyond its end.

Consider the post-image as well to see if the added lines already make
up a full function.  If it doesn't then search for the previous function
line by starting from the bottom of the pre-image, thereby avoiding to
confuse get_func_line().

Reuse the existing label called "again", as it's exactly where we need
to jump to when we're done handling the pre-context, but rename it to
"post_context_calculation" in order to document its new purpose better.

Reported-by: Junio C Hamano <gitster@pobox.com>
Initial-patch-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-31 13:08:56 -07:00
René Scharfe
ff2981f724 xdiff: factor out match_func_rec()
Add match_func_rec(), a helper that wraps accessing a record and calling
the appropriate function for checking if it contains a function line.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-31 13:08:56 -07:00
Junio C Hamano
0018da1088 Merge branch 'jk/diff-compact-heuristic'
Patch output from "git diff" and friends has been tweaked to be
more readable by using a blank line as a strong hint that the
contents before and after it belong to a logically separate unit.

* jk/diff-compact-heuristic:
  diff: undocument the compaction heuristic knobs for experimentation
  xdiff: implement empty line chunk heuristic
  xdiff: add recs_match helper function
2016-05-06 14:45:46 -07:00
Stefan Beller
d634d61ed6 xdiff: implement empty line chunk heuristic
In order to produce the smallest possible diff and combine several diff
hunks together, we implement a heuristic from GNU Diff which moves diff
hunks forward as far as possible when we find common context above and
below a diff hunk. This sometimes produces less readable diffs when
writing C, Shell, or other programming languages, ie:

...
 /*
+ *
+ *
+ */
+
+/*
...

instead of the more readable equivalent of

...
+/*
+ *
+ *
+ */
+
 /*
...

Implement the following heuristic to (optionally) produce the desired
output.

  If there are diff chunks which can be shifted around, shift each hunk
  such that the last common empty line is below the chunk with the rest
  of the context above.

This heuristic appears to resolve the above example and several other
common issues without producing significantly weird results. However, as
with any heuristic it is not really known whether this will always be
more optimal. Thus, it can be disabled via diff.compactionHeuristic.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-19 10:53:34 -07:00
Jacob Keller
92e5b62fec xdiff: add recs_match helper function
It is a common pattern in xdl_change_compact to check that hashes and
strings match. The resulting code to perform this change causes very
long lines and makes it hard to follow the intention. Introduce a helper
function recs_match which performs both checks to increase
code readability.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-18 11:47:08 -07:00
Junio C Hamano
aa3a2c2af6 Merge branch 'rj/xdiff-prepare-plug-leak-on-error-codepath'
A small memory leak in an error codepath has been plugged in xdiff
code.

* rj/xdiff-prepare-plug-leak-on-error-codepath:
  xdiff/xprepare: fix a memory leak
  xdiff/xprepare: use the XDF_DIFF_ALG() macro to access flag bits
2016-04-03 10:29:33 -07:00
Ramsay Jones
87f1625836 xdiff/xprepare: fix a memory leak
The xdl_prepare_env() function may initialise an xdlclassifier_t
data structure via xdl_init_classifier(), which allocates memory
to several fields, for example 'rchash', 'rcrecs' and 'ncha'.
If this function later exits due to the failure of xdl_optimize_ctxs(),
then this xdlclassifier_t structure, and the memory allocated to it,
is not cleaned up.

In order to fix the memory leak, insert a call to xdl_free_classifier()
before returning.

Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-04 15:51:08 -08:00
Ramsay Jones
5cd6978a9c xdiff/xprepare: use the XDF_DIFF_ALG() macro to access flag bits
Commit 307ab20b3 ("xdiff: PATIENCE/HISTOGRAM are not independent option
bits", 19-02-2012) introduced the XDF_DIFF_ALG() macro to access the
flag bits used to represent the diff algorithm requested. In addition,
code which had used explicit manipulation of the flag bits was changed
to use the macros.

However, one example of direct manipulation remains. Update this code to
use the XDF_DIFF_ALG() macro.

Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-04 15:51:06 -08:00
Junio C Hamano
c1fa85ff8c Merge branch 'ps/plug-xdl-merge-leak'
* ps/plug-xdl-merge-leak:
  xdiff/xmerge: fix memory leak in xdl_merge
2016-02-26 13:37:22 -08:00
Junio C Hamano
18b26b18c5 Merge branch 'jk/no-diff-emit-common'
"git merge-tree" used to mishandle "both sides added" conflict with
its own "create a fake ancestor file that has the common parts of
what both sides have added and do a 3-way merge" logic; this has
been updated to use the usual "3-way merge with an empty blob as
the fake common ancestor file" approach used in the rest of the
system.

* jk/no-diff-emit-common:
  xdiff: drop XDL_EMIT_COMMON
  merge-tree: drop generate_common strategy
  merge-one-file: use empty blob for add/add base
2016-02-26 13:37:14 -08:00
Patrick Steinhardt
4867f1184c xdiff/xmerge: fix memory leak in xdl_merge
When building the script for the second file that is to be merged
we have already allocated memory for data structures related to
the first file. When we encounter an error in building the second
script we only free allocated memory related to the second file
before erroring out.

Fix this memory leak by also releasing allocated memory related
to the first file.

Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-23 12:58:26 -08:00
Jeff King
907681e940 xdiff: drop XDL_EMIT_COMMON
There are no more callers that use this mode, and none
likely to be added (as our xdl_merge() eliminates the common
use of it for generating 3-way merge bases).

This is effectively a revert of a9ed376 (xdiff: generate
"anti-diffs" aka what is common to two files, 2006-06-28),
though of course trying to revert that ancient commit
directly produces many textual conflicts.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22 22:36:09 -08:00
Johannes Schindelin
15980deab9 merge-file: ensure that conflict sections match eol style
In the previous patch, we made sure that the conflict markers themselves
match the end-of-line style of the input files. However, this still left
out the conflicting text itself: if it lacks a trailing newline, we
add one, and should add a carriage return when appropriate, too.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-27 10:21:53 -08:00
Johannes Schindelin
86efa21527 merge-file: let conflict markers match end-of-line style of the context
When merging files with CR/LF line endings, the conflict markers should
match those, lest the output file has mixed line endings.

This is particularly of interest on Windows, where some editors get
*really* confused by mixed line endings.

The original version of this patch by Beat Bolli respected core.eol, and
a subsequent improvement by this developer also respected gitattributes.
This approach was suboptimal, though: `git merge-file` was invented as a
drop-in replacement for GNU merge and as such has no problem operating
outside of any repository at all!

Another problem with the original approach was pointed out by Junio
Hamano: legacy repositories might have their text files committed using
CR/LF line endings (and core.eol and the gitattributes would give us a
false impression there). Therefore, the much superior approach is to
simply match the context's line endings, if any.

We actually do not have to look at the *entire* context at all: if the
files are all LF-only, or if they all have CR/LF line endings, it is
sufficient to look at just a *single* line to match that style. And if
the line endings are mixed anyway, it is *still* okay to imitate just a
single line's eol: we will just add to the pile of mixed line endings,
and there is nothing we can do about that.

So what we do is: we look at the line preceding the conflict, falling
back to the line preceding that in case it was the last line and had no
line ending, falling back to the first line, first in the first
post-image, then the second post-image, and finally the pre-image.
If we find consistent CR/LF (or undecided) end-of-line style, we match
that, otherwise we use LF-only line endings for the conflict markers.

Note that while it is true that there have to be at least two lines we
can look at (otherwise there would be no conflict), the same is not true
for line *endings*: the three files in question could all consist of a
single line without any line ending, each. In this case we fall back to
using LF-only.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-27 10:21:26 -08:00
Max Kirillov
ba311807f8 git-merge-file: do not add LF at EOF while applying unrelated change
If 'current-file' does not contain LF at EOF, and change between
'base-file' and 'other-file' does not change any line close to EOF, the
3-way merge should not add LF to EOF.  This is what 'diff3 -m' does, and
seems to be a reasonable expectation.

The change which introduced the behavior is cd1d61c44f. It always calls
function xdl_recs_copy() for sides with add_nl == 1. In fact, it looks
like the only case when this is needed is when 2 files are being
union-merged, and they do not have LF at EOF (strictly speaking, the
first of them).

Add tests:
* "merge without conflict (missing LF at EOF, away from change in the
other file)" and "merge does not add LF away of change", to demonstrate
the changed behavior.
* "conflict at EOF without LF resolved by --union", to verify that the
union-merge at the end inerts newline between versions.
* some more tests which I felt like not covering the functionality well

Signed-off-by: Max Kirillov <max@max630.net>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-30 14:07:58 -07:00
Junio C Hamano
c01499ef69 C: have space around && and || operators
Correct all hits from

    git grep -e '\(&&\|||\)[^ ]' -e '[^	 ]\(&&\|||\)' -- '*.c'

i.e. && or || operators that are followed by anything but a SP,
or that follow something other than a SP or a HT, so that these
operators have a SP around it when necessary.

We usually refrain from making this kind of a tree-wide change in
order to avoid unnecessary conflicts with other "real work" patches,
but in this case, the end result does not have a potentially
cumbersome tree-wide impact, while this is a tree-wide cleanup.

Fixes to compat/regex/regcomp.c and xdiff/xemit.c are to replace a
HT immediately after && with a SP.

This is based on Felipe's patch to bultin/symbolic-ref.c; I did all
the finding out what other files in the whole tree need to be fixed
and did the fix and also the log message while reviewing that single
liner, so any screw-ups in this version are mine.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-16 10:26:39 -07:00
Antoine Pelisse
36617af7ed diff: add --ignore-blank-lines option
The goal of the patch is to introduce the GNU diff
-B/--ignore-blank-lines as closely as possible. The short option is not
available because it's already used for "break-rewrites".

When this option is used, git-diff will not create hunks that simply
add or remove empty lines, but will still show empty lines
addition/suppression if they are close enough to "valuable" changes.

There are two differences between this option and GNU diff -B option:
- GNU diff doesn't have "--inter-hunk-context", so this must be handled
- The following sequence looks like a bug (context is displayed twice):

    $ seq 5 >file1
    $ cat <<EOF >file2
    change
    1
    2

    3
    4
    5
    change
    EOF
    $ diff -u -B file1 file2
    --- file1	2013-06-08 22:13:04.471517834 +0200
    +++ file2	2013-06-08 22:13:23.275517855 +0200
    @@ -1,5 +1,7 @@
    +change
     1
     2
    +
     3
     4
     5
    @@ -3,3 +5,4 @@
     3
     4
     5
    +change

So here is a more thorough description of the option:
- real changes are interesting
- blank lines that are close enough (less than context size) to
interesting changes are considered interesting (recursive definition)
- "context" lines are used around each hunk of interesting changes
- If two hunks are separated by less than "inter-hunk-context", they
will be merged into one.

The implementation does the "interesting changes selection" in a single
pass.

Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-19 15:17:45 -07:00
Stefano Lattarini
41ccfdd9c9 Correct common spelling mistakes in comments and tests
Most of these were found using Lucas De Marchi's codespell tool.

Signed-off-by: Stefano Lattarini <stefano.lattarini@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-12 13:38:40 -07:00
Junio C Hamano
0bc8bea2b4 Merge branch 'rs/xdiff-fast-hash-fix'
Fixes compilation issue on 32-bit in an earlier series.
2012-05-25 12:05:02 -07:00
René Scharfe
8072766cc6 xdiff: import new 32-bit version of count_masked_bytes()
Import the latest 32-bit implementation of count_masked_bytes() from
Linux (arch/x86/include/asm/word-at-a-time.h).  It's shorter and avoids
overflows and negative numbers.

This fixes test failures on 32-bit, where negative partial results had
been shifted right using the "wrong" method (logical shift right instead
of arithmetic short right).  The compiler is free to chose the method,
so it was only wrong in the sense that it didn't work as intended by us.

Reported-by: Øyvind A. Holm <sunny@sunbase.org>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-23 09:10:17 -07:00
René Scharfe
7e356a9794 xdiff: avoid more compiler warnings with XDL_FAST_HASH on 32-bit machines
Hide literals that can cause compiler warnings for 32-bit architectures in
expressions that evaluate to small numbers there.  Some compilers warn that
0x0001020304050608 won't fit into a 32-bit long, others that shifting right
by 56 bits clears a 32-bit value completely.

The correct values are calculated in the 64-bit case, which is all that matters
in this if-branch.

Reported-by: Øyvind A. Holm <sunny@sunbase.org>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Acked-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-23 09:10:03 -07:00
René Scharfe
9322ce21ee xdiff: avoid compiler warnings with XDL_FAST_HASH on 32-bit machines
Import macro REPEAT_BYTE from Linux (arch/x86/include/asm/word-at-a-time.h)
to avoid 64-bit integer literals, which cause some 32-bit compilers to
print warnings.

Reported-by: Øyvind A. Holm <sunny@sunbase.org>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-22 14:39:49 -07:00
René Scharfe
be89977543 xdiff: remove unused functions
The functions xdl_cha_first(), xdl_cha_next() and xdl_atol() are not used
by us.  While removing them increases the difference to the upstream
version of libxdiff, it only adds a bit to the more than 600 differing
lines in xutils.c (mmfile_t management was simplified significantly when
the library was imported initially).  Besides, if upstream modifies these
functions in the future, we won't need to think about importing those
changes, so in that sense it makes tracking modifications easier.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-09 14:13:05 -07:00
René Scharfe
3319e60633 xdiff: remove emit_func() and xdi_diff_hunks()
The functions are unused now, remove them.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-09 14:08:42 -07:00
René Scharfe
467d348c19 xdiff: add hunk_func()
Add a way to register a callback function that is gets passed the
start line and line count of each hunk of a diff.  Only standard
types are used.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-09 14:00:15 -07:00
Junio C Hamano
4d1f0ef210 Merge branch 'tr/xdiff-fast-hash'
Use word-at-a-time comparison to find end of line or NUL (end of buffer),
borrowed from the linux-kernel discussion.

By Thomas Rast
* tr/xdiff-fast-hash:
  xdiff: choose XDL_FAST_HASH code on sizeof(long) instead of __WORDSIZE
  xdiff: load full words in the inner loop of xdl_hash_record
2012-05-02 13:54:58 -07:00
Thomas Rast
6f1af028ce xdiff: choose XDL_FAST_HASH code on sizeof(long) instead of __WORDSIZE
Darwin does not define __WORDSIZE, and compiles the 32-bit code path
on 64-bit systems, resulting in a totally broken git.

I could not find an alternative -- other than the platform symbols
(__x86_64__ etc.) -- that does the test in the preprocessor.  However,
we can also just test for the size of a 'long', which is what really
matters here.  Any compiler worth its salt will leave only the branch
relevant for its platform, and indeed on Linux/GCC the numbers don't
change:

 Test                                  tr/darwin-xdl-fast-hash   origin/next              origin/master
 ------------------------------------------------------------------------------------------------------------------
 4000.1: log -3000 (baseline)          0.09(0.07+0.01)           0.09(0.07+0.01) -5.5%*   0.09(0.07+0.01) -4.1%
 4000.2: log --raw -3000 (tree-only)   0.47(0.41+0.05)           0.47(0.40+0.05) -0.5%    0.45(0.38+0.06) -3.5%.
 4000.3: log -p -3000 (Myers)          1.81(1.67+0.12)           1.81(1.67+0.13) +0.3%    1.99(1.84+0.12) +10.2%***
 4000.4: log -p -3000 --histogram      1.79(1.66+0.11)           1.80(1.67+0.11) +0.4%    1.96(1.82+0.10) +9.2%***
 4000.5: log -p -3000 --patience       2.17(2.02+0.13)           2.20(2.04+0.13) +1.3%.   2.33(2.18+0.13) +7.4%***
 ------------------------------------------------------------------------------------------------------------------
 Significance hints:  '.' 0.1  '*' 0.05  '**' 0.01  '***' 0.001

Noticed-by: Brian Gernhardt <brian@gernhardtsoftware.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-01 12:19:06 -07:00
Junio C Hamano
86c340e082 Merge branch 'jc/diff-algo-cleanup'
Resurrects the preparatory clean-up patches from another topic that was
discarded, as this would give a saner foundation to build on diff.algo
configuration option series.

* jc/diff-algo-cleanup:
  xdiff: PATIENCE/HISTOGRAM are not independent option bits
  xdiff: remove XDL_PATCH_* macros
2012-04-15 22:51:15 -07:00
Thomas Rast
6942efcfa9 xdiff: load full words in the inner loop of xdl_hash_record
Redo the hashing loop in xdl_hash_record in a way that loads an entire
'long' at a time, using masking tricks to see when and where we found
the terminating '\n'.

I stole inspiration and code from the posts by Linus Torvalds around

  https://lkml.org/lkml/2012/3/2/452
  https://lkml.org/lkml/2012/3/5/6

His method reads the buffers in sizeof(long) increments, and may thus
overrun it by at most sizeof(long)-1 bytes before it sees the final
newline (or hits the buffer length check).  I considered padding out
all buffers by a suitable amount to "catch" the overrun, but

* this does not work for mmap()'d buffers: if you map 4096+8 bytes
  from a 4096 byte file, accessing the last 8 bytes results in a
  SIGBUS on my machine; and

* it would also be extremely ugly because it intrudes deep into the
  unpacking machinery.

So I adapted it to not read beyond the buffer at all.  Instead, it
reads the final partial word byte-by-byte and strings it together.
Then it can use the same logic as before to finish the hashing.

So far we enable this only on x86_64, where it provides nice speedup
for diff-related work:

  Test                                  origin/next      tr/xdiff-fast-hash
  -----------------------------------------------------------------------------
  4000.1: log -3000 (baseline)          0.07(0.05+0.02)  0.08(0.06+0.02) +14.3%
  4000.2: log --raw -3000 (tree-only)   0.37(0.33+0.04)  0.37(0.32+0.04) +0.0%
  4000.3: log -p -3000 (Myers)          1.75(1.65+0.09)  1.60(1.49+0.10) -8.6%
  4000.4: log -p -3000 --histogram      1.73(1.62+0.09)  1.58(1.49+0.08) -8.7%
  4000.5: log -p -3000 --patience       2.11(2.00+0.10)  1.94(1.80+0.11) -8.1%

Perhaps other platforms could also benefit.  However it does NOT work
on big-endian systems!

[jc: minimum style and compilation fixes]

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-09 17:03:25 -07:00
Junio C Hamano
307ab20b33 xdiff: PATIENCE/HISTOGRAM are not independent option bits
Because the default Myers, patience and histogram algorithms cannot be in
effect at the same time, XDL_PATIENCE_DIFF and XDL_HISTOGRAM_DIFF are not
independent bits.  Instead of wasting one bit per algorithm, define a few
macros to access the few bits they occupy and update the code that access
them.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-19 15:36:55 -08:00
Junio C Hamano
e5b06629de xdiff: remove XDL_PATCH_* macros
These are not used anywhere in our codebase, and the bit assignment
definition is merely confusing.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-19 14:32:25 -08:00
Junio C Hamano
86e15ff4fe Merge branch 'rs/diff-postimage-in-context'
* rs/diff-postimage-in-context:
  xdiff: print post-image for common records instead of pre-image
2012-01-29 13:18:55 -08:00
René Scharfe
baf5aaa333 xdiff: print post-image for common records instead of pre-image
Normally it doesn't matter if we show the pre-image or th post-image
for the common parts of a diff because they are the same.  If
white-space changes are ignored they can differ, though.  The
new text after applying the diff is more interesting in that case,
so show that instead of the old contents.

Note: GNU diff shows the pre-image.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-06 11:10:05 -08:00
Junio C Hamano
9b55aa03da Merge branch 'rs/diff-whole-function'
* rs/diff-whole-function:
  diff: add option to show whole functions as context
  xdiff: factor out get_func_line()
2011-10-19 10:49:13 -07:00
Junio C Hamano
7a63a920fd Merge branch 'rs/diff-cleanup-records-fix'
* rs/diff-cleanup-records-fix:
  diff: resurrect XDF_NEED_MINIMAL with --minimal
  Revert removal of multi-match discard heuristic in 27af01
2011-10-13 19:03:22 -07:00
René Scharfe
14937c2c06 diff: add option to show whole functions as context
Add the option -W/--function-context to git diff.  It is similar to
the same option of git grep and expands the context of change hunks
so that the whole surrounding function is shown.  This "natural"
context can allow changes to be understood better.

Note: GNU patch doesn't like diffs generated with the new option;
it seems to expect context lines to be the same before and after
changes.  git apply doesn't complain.

This implementation has the same shortcoming as the one in grep,
namely that there is no way to explicitly find the end of a
function.  That means that a few lines of extra context are shown,
right up to the next recognized function begins.  It's already
useful in its current form, though.

The function get_func_line() in xdiff/xemit.c is extended to work
forward as well as backward to find post-context as well as
pre-context.  It returns the position of the first found matching
line.  The func_line parameter is made optional, as we don't need
it for -W.

The enhanced function is then used in xdl_emit_diff() to extend
the context as needed.  If the added context overlaps with the
next change, it is merged into the current hunk.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-10 12:05:07 -07:00
René Scharfe
f99f4b3667 xdiff: factor out get_func_line()
Move the code to search for a function line to be shown in the hunk
header into its own function and to make returning the length-limited
result string easier, introduce struct func_line.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-10 11:59:30 -07:00
René Scharfe
c5aa90682f Revert removal of multi-match discard heuristic in 27af01
27af01d (xdiff/xprepare: improve O(n*m) performance in
xdl_cleanup_records(), 2011-08-17) was supposed to be a performance
boost only. However, it unexpectedly changed the behaviour of diff.

Revert a part of 27af01d that removes logic that mark lines as
"multi-match" (ie. dis[i] == 2). This was preventing the multi-match
discard heuristic (performed in xdl_cleanup_records() and
xdl_clean_mmatch()) from executing.

Reported-by: Alexander Pepper <pepper@inf.fu-berlin.de>
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-26 11:38:14 -07:00
Junio C Hamano
b648557ef1 Merge branch 'rc/histogram-diff'
* rc/histogram-diff:
  xdiff/xprepare: initialise xdlclassifier_t cf in xdl_prepare_env()
2011-09-06 11:42:58 -07:00
Tay Ray Chuan
2738bc3f09 xdiff/xprepare: initialise xdlclassifier_t cf in xdl_prepare_env()
Ensure that the xdl_free_classifier() call on xdlclassifier_t cf is safe
even if xdl_init_classifier() isn't called. This may occur in the case
where diff is run with --histogram and a call to, say, xdl_prepare_ctx()
fails.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-31 10:03:51 -07:00
Junio C Hamano
b14b969ab9 Merge branch 'rc/histogram-diff' into HEAD
* rc/histogram-diff:
  xdiff/xhistogram: drop need for additional variable
  xdiff/xhistogram: rely on xdl_trim_ends()
  xdiff/xhistogram: rework handling of recursed results
  xdiff: do away with xdl_mmfile_next()
  Make test number unique
  xdiff/xprepare: use a smaller sample size for histogram diff
  xdiff/xprepare: skip classification
  teach --histogram to diff
  t4033-diff-patience: factor out tests
  xdiff/xpatience: factor out fall-back-diff function
  xdiff/xprepare: refactor abort cleanups
  xdiff/xprepare: use memset()

Conflicts:
	xdiff/xprepare.c
2011-08-17 17:17:16 -07:00
Tay Ray Chuan
27af01d552 xdiff/xprepare: improve O(n*m) performance in xdl_cleanup_records()
In xdl_cleanup_records(), we see O(n*m) performance, where n is the
number of records from xdf->dstart to xdf->dend, and m is the size of a
bucket in xdf->rhash (<= by mlim).

Here, we improve this to O(n) by pre-computing nm (in rcrec->len(1|2))
in xdl_classify_record().

Reported-by: Marat Radchenko <marat@slonopotamus.org>
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-17 17:15:05 -07:00
Tay Ray Chuan
6486a84cb8 xdiff/xhistogram: drop need for additional variable
Having an additional variable (ptr) instead of changing line(1|2) and
count(1|2) was for debugging purposes.

Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-08 13:00:17 -07:00