Commit graph

702 commits

Author SHA1 Message Date
Junio C Hamano 55bb5c9147 zlib: wrap deflate side of the API
Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use
of deflateInit2 in remote-curl.c to tell the library to use gzip header
and trailer in git_deflate_init_gzip().

There is only one caller that cares about the status from deflateEnd().
Introduce git_deflate_end_gently() to let that sole caller retrieve the
status and act on it (i.e. die) for now, but we would probably want to
make inflate_end/deflate_end die when they ran out of memory and get
rid of the _gently() kind.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10 11:10:29 -07:00
Junio C Hamano b3c89315a3 Merge branch 'jc/rename-degrade-cc-to-c' into maint
* jc/rename-degrade-cc-to-c:
  diffcore-rename: fall back to -C when -C -C busts the rename limit
  diffcore-rename: record filepair for rename src
  diffcore-rename: refactor "too many candidates" logic
  builtin/diff.c: remove duplicated call to diff_result_code()
2011-05-31 12:00:02 -07:00
Junio C Hamano d9ac3e41c3 Merge branch 'jm/maint-diff-words-with-sbe' into maint
* jm/maint-diff-words-with-sbe:
  do not read beyond end of malloc'd buffer
2011-05-26 09:43:00 -07:00
Jim Meyering 42536dd9b9 do not read beyond end of malloc'd buffer
With diff.suppress-blank-empty=true, "git diff --word-diff" would
output data that had been read from uninitialized heap memory.
The problem was that fn_out_consume did not account for the
possibility of a line with length 1, i.e., the empty context line
that diff.suppress-blank-empty=true converts from " \n" to "\n".
Since it assumed there would always be a prefix character (the space),
it decremented "len" unconditionally, thus passing len=0 to emit_line,
which would then blindly call emit_line_0 with len=-1 which would
pass that value on to fwrite as SIZE_MAX.  Boom.

Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 11:39:49 -07:00
Junio C Hamano a613b534bc Merge branch 'jc/fix-diff-files-unmerged' into maint
* jc/fix-diff-files-unmerged:
  diff-files: show unmerged entries correctly
  diff: remove often unused parameters from diff_unmerge()
  diff.c: return filepair from diff_unmerge()
  test: use $_z40 from test-lib
2011-05-13 10:41:54 -07:00
Junio C Hamano f5bf1b5f6b Merge branch 'jh/dirstat' into maint
* jh/dirstat:
  --dirstat: In case of renames, use target filename instead of source filename
  Teach --dirstat not to completely ignore rearranged lines within a file
  --dirstat-by-file: Make it faster and more correct
  --dirstat: Describe non-obvious differences relative to --stat or regular diff
2011-05-04 14:59:07 -07:00
Junio C Hamano fa7b290895 diff: remove often unused parameters from diff_unmerge()
e9c8409 (diff-index --cached --raw: show tree entry on the LHS for
unmerged entries., 2007-01-05) added a <mode, object name> pair as
parameters to this function, to store them in the pre-image side of an
unmerged file pair.  Now the function is fixed to return the filepair it
queued, we can make the caller on the special case codepath to do so.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-23 22:34:43 -07:00
Junio C Hamano 76399c0195 diff.c: return filepair from diff_unmerge()
The underlying diff_queue() returns diff_filepair so that the caller can
further add information to it, and the helper function diff_unmerge()
utilizes the feature itself, but does not expose it to its callers, which
was kind of selfish.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-23 22:34:43 -07:00
Johan Herland 2ca8671470 --dirstat: In case of renames, use target filename instead of source filename
This changes --dirstat analysis to count "damage" toward the target filename,
rather than the source filename. For renames within a directory, this won't
matter to the final output, but when moving files between diretories, the
output now lists the target directory rather than the source directory.

Signed-off-by: Johan Herland <johan@herland.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-12 11:29:34 -07:00
Johan Herland 2ff3a80334 Teach --dirstat not to completely ignore rearranged lines within a file
Currently, the --dirstat analysis ignores when lines within a file are
rearranged, because the "damage" calculated by show_dirstat() is 0.
However, if the object name has changed, we already know that there is
some damage, and it is unintuitive to claim there is _no_ damage.

Teach show_dirstat() to assign a minimum amount of damage (== 1) to
entries for which the analysis otherwise yields zero damage, to still
represent that these files are changed, instead of saying that there
is no change.

Also, skip --dirstat analysis when the object names are the same (e.g. for
a pure file rename).

Signed-off-by: Johan Herland <johan@herland.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-11 11:16:15 -07:00
Johan Herland 0133dab75d --dirstat-by-file: Make it faster and more correct
Currently, when using --dirstat-by-file, it first does the full --dirstat
analysis (using diffcore_count_changes()), and then resets 'damage' to 1,
if any damage was found by diffcore_count_changes().

But --dirstat-by-file is not interested in the file damage per se. It only
cares if the file changed at all. In that sense it only cares if the blob
object for a file has changed. We therefore only need to compare the
object names of each file pair in the diff queue and we can skip the
entire --dirstat analysis and simply set 'damage' to 1 for each entry
where the object name has changed.

This makes --dirstat-by-file faster, and also bypasses --dirstat's practice
of ignoring rearranged lines within a file.

The patch also contains an added testcase verifying that --dirstat-by-file
now detects changes that only rearrange lines within a file.

Signed-off-by: Johan Herland <johan@herland.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-11 10:12:24 -07:00
Junio C Hamano f31027c99c diffcore-rename: fall back to -C when -C -C busts the rename limit
When there are too many paths in the project, the number of rename source
candidates "git diff -C -C" finds will exceed the rename detection limit,
and no inexact rename detection is performed.  We however could fall back
to "git diff -C" if the number of modified paths is sufficiently small.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-22 14:29:07 -07:00
Johannes Schindelin c0aa335c95 Remove unused variables
Noticed by gcc 4.6.0.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-22 11:43:27 -07:00
Stephen Boyd c2e86addb8 Fix sparse warnings
Fix warnings from 'make check'.

 - These files don't include 'builtin.h' causing sparse to complain that
   cmd_* isn't declared:

   builtin/clone.c:364, builtin/fetch-pack.c:797,
   builtin/fmt-merge-msg.c:34, builtin/hash-object.c:78,
   builtin/merge-index.c:69, builtin/merge-recursive.c:22
   builtin/merge-tree.c:341, builtin/mktag.c:156, builtin/notes.c:426
   builtin/notes.c:822, builtin/pack-redundant.c:596,
   builtin/pack-refs.c:10, builtin/patch-id.c:60, builtin/patch-id.c:149,
   builtin/remote.c:1512, builtin/remote-ext.c:240,
   builtin/remote-fd.c:53, builtin/reset.c:236, builtin/send-pack.c:384,
   builtin/unpack-file.c:25, builtin/var.c:75

 - These files have symbols which should be marked static since they're
   only file scope:

   submodule.c:12, diff.c:631, replace_object.c:92, submodule.c:13,
   submodule.c:14, trace.c:78, transport.c:195, transport-helper.c:79,
   unpack-trees.c:19, url.c:3, url.c:18, url.c:104, url.c:117, url.c:123,
   url.c:129, url.c:136, thread-utils.c:21, thread-utils.c:48

 - These files redeclare symbols to be different types:

   builtin/index-pack.c:210, parse-options.c:564, parse-options.c:571,
   usage.c:49, usage.c:58, usage.c:63, usage.c:72

 - These files use a literal integer 0 when they really should use a NULL
   pointer:

   daemon.c:663, fast-import.c:2942, imap-send.c:1072, notes-merge.c:362

While we're in the area, clean up some unused #includes in builtin files
(mostly exec_cmd.h).

Signed-off-by: Stephen Boyd <bebarino@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-22 10:16:54 -07:00
Junio C Hamano 0ce6a51b43 Merge branch 'jk/merge-rename-ux'
* jk/merge-rename-ux:
  pull: propagate --progress to merge
  merge: enable progress reporting for rename detection
  add inexact rename detection progress infrastructure
  commit: stop setting rename limit
  bump rename limit defaults (again)
  merge: improve inexact rename limit warning
2011-03-19 23:23:56 -07:00
Junio C Hamano 4e530c5049 Merge branch 'jk/diffstat-binary' into maint
* jk/diffstat-binary:
  diff: don't retrieve binary blobs for diffstat
  diff: handle diffstat of rewritten binary files
2011-03-16 16:47:26 -07:00
Jonathan Nieder 9cba13ca5d standardize brace placement in struct definitions
In a struct definitions, unlike functions, the prevailing style is for
the opening brace to go on the same line as the struct name, like so:

 struct foo {
	int bar;
	char *baz;
 };

Indeed, grepping for 'struct [a-z_]* {$' yields about 5 times as many
matches as 'struct [a-z_]*$'.

Linus sayeth:

 Heretic people all over the world have claimed that this inconsistency
 is ...  well ...  inconsistent, but all right-thinking people know that
 (a) K&R are _right_ and (b) K&R are right.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-16 12:49:02 -07:00
Jeff King abb371a1ef diff: don't retrieve binary blobs for diffstat
We only need the size, which is much cheaper to get,
especially if it is a big binary file.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-22 10:58:18 -08:00
Jeff King ded0abc73c diff: handle diffstat of rewritten binary files
The logic in builtin_diffstat assumes that a
complete_rewrite pair should have its lines counted. This is
nonsensical for binary files and leads to confusing things
like:

  $ git diff --stat --summary HEAD^ HEAD
   foo.rand |  Bin 4096 -> 4096 bytes
   1 files changed, 0 insertions(+), 0 deletions(-)

  $ git diff --stat --summary -B HEAD^ HEAD
   foo.rand |   34 +++++++++++++++-------------------
   1 files changed, 15 insertions(+), 19 deletions(-)
   rewrite foo.rand (100%)

So let's reorder the function to handle binary files first
(which from diffstat's perspective look like complete
rewrites anyway), then rewrites, then actual diffstats.

There are two bonus prizes to this reorder:

  1. It gets rid of a now-superfluous goto.

  2. The binary case is at the top, which means we can
     further optimize it in the next patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-22 10:57:58 -08:00
Jeff King 92c57e5c1d bump rename limit defaults (again)
We did this once before in 5070591 (bump rename limit
defaults, 2008-04-30). Back then, we were shooting for about
1 second for a diff/log calculation, and 5 seconds for a
merge.

There are a few new things to consider, though:

  1. Average processors are faster now.

  2. We've seen on the mailing list some ugly merges where
     not using inexact rename detection leads to many more
     conflicts. Merges of this size take a long time
     anyway, so users are probably happy to spend a little
     bit of time computing the renames.

Let's bump the diff/merge default limits from 200/500 to
400/1000. Those are 2 seconds and 10 seconds respectively on
my modern hardware.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-21 10:23:36 -08:00
Junio C Hamano 6ae7a51a2e Merge branch 'ks/blame-worktree-textconv-cached'
* ks/blame-worktree-textconv-cached:
  fill_textconv(): Don't get/put cache if sha1 is not valid
  t/t8006: Demonstrate blame is broken when cachetextconv is on
2010-12-21 14:30:52 -08:00
Kirill Smelkov 9ec09b0495 fill_textconv(): Don't get/put cache if sha1 is not valid
When blaming files in the working tree, the filespec is marked with
!sha1_valid, as we have not given the contents an object name yet.  The
function to cache textconv results (keyed on the object name), however,
didn't check this condition, and ended up on storing the cached result
under a random object name.

Cc: Axel Bonnet <axel.bonnet@ensimag.imag.fr>
Cc: Clément Poulain <clement.poulain@ensimag.imag.fr>
Cc: Diane Gasselin <diane.gasselin@ensimag.imag.fr>
Cc: Jeff King <peff@peff.net>
Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-19 18:41:32 -08:00
Junio C Hamano cf7a64b54a Merge branch 'kb/diff-C-M-synonym'
* kb/diff-C-M-synonym:
  diff: use "find" instead of "detect" as prefix for long forms of -M and -C
  diff: add --detect-copies-harder as a synonym for --find-copies-harder
2010-12-16 12:58:59 -08:00
Yann Dirson f611ddc774 diff: use "find" instead of "detect" as prefix for long forms of -M and -C
It is more consistent with existing --find-copies-harder; luckily "detect"
variant has not appeared in any officially released version of git.

Signed-off-by: Yann Dirson <ydirson@altern.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-10 13:52:05 -08:00
Junio C Hamano 8577def6fc Merge branch 'np/diff-in-corrupt-repository' into maint
* np/diff-in-corrupt-repository:
  diff: don't presume empty file when corresponding object is missing
2010-12-09 10:36:39 -08:00
Junio C Hamano ae0a37cd6b Merge branch 'cm/diff-check-at-eol' into maint
* cm/diff-check-at-eol:
  diff --check: correct line numbers of new blank lines at EOF
2010-12-09 10:36:10 -08:00
Junio C Hamano f04aa35eb6 Merge branch 'jk/diff-CBM'
* jk/diff-CBM:
  diff: report bogus input to -C/-M/-B
2010-12-08 11:24:11 -08:00
Junio C Hamano f5a5531e4e Merge branch 'np/diff-in-corrupt-repository'
* np/diff-in-corrupt-repository:
  diff: don't presume empty file when corresponding object is missing
2010-11-29 17:52:33 -08:00
Junio C Hamano 039e84e30d Merge branch 'cm/diff-check-at-eol'
* cm/diff-check-at-eol:
  diff --check: correct line numbers of new blank lines at EOF
2010-11-29 17:52:31 -08:00
Kevin Ballard 150a5daad0 diff: add --detect-copies-harder as a synonym for --find-copies-harder
Signed-off-by: Kevin Ballard <kevin@sb.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-29 16:58:27 -08:00
Junio C Hamano 9cffe2018a Merge branch 'cb/diff-fname-optim' into maint
* cb/diff-fname-optim:
  diff: avoid repeated scanning while looking for funcname
  do not search functions for patch ID
  add rebase patch id tests
2010-11-24 12:46:26 -08:00
Junio C Hamano 78bce6c7e9 Merge branch 'jk/no-textconv-symlink' into maint
* jk/no-textconv-symlink:
  diff: don't use pathname-based diff drivers for symlinks
2010-11-24 12:46:20 -08:00
Junio C Hamano 8cf666c9ee Merge branch 'cb/diff-fname-optim'
* cb/diff-fname-optim:
  diff: avoid repeated scanning while looking for funcname
  do not search functions for patch ID
  add rebase patch id tests
2010-11-17 14:59:16 -08:00
Junio C Hamano 6a2e93f107 Merge branch 'jk/no-textconv-symlink'
* jk/no-textconv-symlink:
  diff: don't use pathname-based diff drivers for symlinks
2010-11-17 14:59:10 -08:00
Junio C Hamano 329351feeb Merge branch 'kb/merge-recursive-rename-threshold'
* kb/merge-recursive-rename-threshold:
  diff: add synonyms for -M, -C, -B
  merge-recursive: option to specify rename threshold

Conflicts:
	Documentation/diff-options.txt
	Documentation/merge-strategies.txt
2010-10-26 21:54:04 -07:00
Junio C Hamano d7806967bd Merge branch 'maint'
* maint:
  Fix copy-pasted comments related to tree diff handling.
2010-10-26 15:04:05 -07:00
Yann Dirson c3fced6498 Fix copy-pasted comments related to tree diff handling.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-25 00:21:56 -07:00
Nicolas Pitre c50c4316e1 diff: don't presume empty file when corresponding object is missing
The low-level diff code will happily produce totally bogus diff output
with a broken repository via format-patch and friends by treating missing
objects as empty files.  Let's prevent that from happening any longer.

Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-21 22:23:34 -07:00
Jeff King 07cd726527 diff: report bogus input to -C/-M/-B
We already detect invalid input to these functions, but we
simply exit with an error code, never saying anything as
simple as "your input was wrong". Let's fix that.

Before:

  $ git diff -CM
  $ echo $?
  128

After:

  $ git diff -CM
  error: invalid argument to -C: M
  $ echo $?
  128

There should be no problems with having diff_opt_parse print
to stderr, as there is already precedent in complaining
about bogus --color and --output arguments.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-21 15:44:53 -07:00
Christoph Mallon 8837d33595 diff --check: correct line numbers of new blank lines at EOF
The whitespace check printed the value of the wrong variable, i.e. the
beginning of the block of blank lines at the EOF (possibly absent) in the
old file.

As "git diff --check" is used by users to check their changes before
making a commit, we should point at the line number in the file after
the change.

Signed-off-by: Christoph Mallon <christoph.mallon@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-16 18:57:35 -07:00
Junio C Hamano b3c16ee454 Merge branch 'jc/pickaxe-grep'
* jc/pickaxe-grep:
  diff/log -G<pattern>: tests
  git log/diff: add -G<regexp> that greps in the patch text
  diff: pass the entire diff-options to diffcore_pickaxe()
  gitdiffcore doc: update pickaxe description
2010-09-29 13:49:03 -07:00
Matthieu Moy 9ec26eb7cd diff: trivial fix for --output file error message
The option argument is either after the equal sign in --output=... or in
the next command-line argument. optarg is the reliable way to access it.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-29 13:25:17 -07:00
Kevin Ballard 37ab5156ae diff: add synonyms for -M, -C, -B
Add new long-form options --detect-renames[=<n>], --detect-copies[=<n>],
and --break-rewrites[=[<n>][/<m>]] as synonyms for the -M, -C, and -B
options (respectively).

Signed-off-by: Kevin Ballard <kevin@sb.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-29 13:18:04 -07:00
Kevin Ballard 10ae7526be merge-recursive: option to specify rename threshold
The recursive merge strategy turns on rename detection but leaves the
rename threshold at the default. Add a strategy option to allow the user
to specify a rename threshold to use.

Signed-off-by: Kevin Ballard <kevin@sb.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-29 13:15:56 -07:00
Clemens Buchacher ad14b450c0 do not search functions for patch ID
Visual aids, such as the function name in the hunk
header, are not necessary for the purposes of
computing a patch ID.

This is a performance optimization.

Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-23 18:35:07 -07:00
Jeff King d391c0ff94 diff: don't use pathname-based diff drivers for symlinks
When we're diffing symlinks, we consider the contents to be
the pathname that the symlink points to. When a user sets up
a userdiff driver like "*.pdf diff=pdf", their "diff.pdf.*"
config generally tells us what to do with the content of
pdf files.

With the current code, we will actually process a symlink
like "link.pdf" using a configured pdf driver, meaning we
are using contents which consist of a pathname with
configuration that is expecting contents that consist of an
actual pdf file.

The most noticeable example of this would have been
textconv; however, it was already protected in its own
textconv-specific code path. We can still see the breakage
with something like "diff.*.binary", though. You could
also see it with diff.*.funcname, though it is a bit harder
to trigger accidentally there.

This patch adds a check for S_ISREG lower in the callstack
than the textconv-specific check, which should block use of
any userdiff config for non-regular files. We can drop the
check in the textconv code, which is now redundant.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-23 18:32:32 -07:00
Junio C Hamano 8ac8cf5bc1 Merge branch 'maint'
* maint:
  xdiff-interface.c: always trim trailing space from xfuncname matches
  diff.c: call regfree to free memory allocated by regcomp when necessary
2010-09-09 17:29:40 -07:00
Brandon Casey ef5644ea6e diff.c: call regfree to free memory allocated by regcomp when necessary
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-09-09 17:18:04 -07:00
Junio C Hamano 7cc1e385a0 Merge branch 'cb/binary-patch-id'
* cb/binary-patch-id:
  hash binary sha1 into patch id
2010-08-31 16:24:48 -07:00
Junio C Hamano f506b8e8b5 git log/diff: add -G<regexp> that greps in the patch text
Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the
"git diff" family of commands.  This limits the diff queue to filepairs
whose patch text actually has an added or a deleted line that matches the
given regexp.  Unlike "-S<regexp>", changing other parts of the line that
has a substring that matches the given regexp IS counted as a change, as
such a change would appear as one deletion followed by one addition in a
patch text.

Unlike -S (pickaxe) that is intended to be used to quickly detect a commit
that changes the number of occurrences of hits between the preimage and
the postimage to serve as a part of larger toolchain, this is meant to be
used as the top-level Porcelain feature.

The implementation unfortunately has to run "diff" twice if you are
running "log" family of commands to produce patches in the final output
(e.g. "git log -p" or "git format-patch").  I think we _could_ cache the
result in-core if we wanted to, but that would require larger surgery to
the diffcore machinery (i.e. adding an extra pointer in the filepair
structure to keep a pointer to a strbuf around, stuff the textual diff to
the strbuf inside diffgrep_consume(), and make use of it in later stages
when it is available) and it may not be worth it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-31 14:30:29 -07:00