Commit graph

1036 commits

Author SHA1 Message Date
Junio C Hamano a0d9b7f015 Merge branch 'sb/diff-cleanup'
Code cleanup.

* sb/diff-cleanup:
  diff: remove dead code
  diff: omit found pointer from emit_callback
  diff.c: use diff_options directly
2016-09-15 14:11:16 -07:00
Junio C Hamano 305d7f1339 Merge branch 'jk/diff-submodule-diff-inline'
The "git diff --submodule={short,log}" mechanism has been enhanced
to allow "--submodule=diff" to show the patch between the submodule
commits bound to the superproject.

* jk/diff-submodule-diff-inline:
  diff: teach diff to display submodule difference with an inline diff
  submodule: refactor show_submodule_summary with helper function
  submodule: convert show_submodule_summary to use struct object_id *
  allow do_submodule_path to work even if submodule isn't checked out
  diff: prepare for additional submodule formats
  graph: add support for --line-prefix on all graph-aware output
  diff.c: remove output_prefix_length field
  cache: add empty_tree_oid object and helper function
2016-09-12 15:34:31 -07:00
Stefan Beller ca9b37e5a8 diff: remove dead code
When `len < 1`, len has to be 0 or negative, emit_line will then remove the
first character and by then `len` would be negative. As this doesn't
happen, it is safe to assume it is dead code.

This continues to simplify the code, which was started in b8d9c1a66b
(2009-09-03,  diff.c: the builtin_diff() deals with only two-file
comparison).

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-08 13:54:37 -07:00
Stefan Beller ba16233ccd diff: omit found pointer from emit_callback
We keep the actual data in the diff options, which are just as accessible.
Remove the pointer stored in struct emit_callback for readability.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-08 13:54:23 -07:00
Stefan Beller fb33b62ca6 diff.c: use diff_options directly
The value of `ecbdata->opt` is accessible via the short variable `o`
already, so let's use that instead.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-08 13:46:46 -07:00
brian m. carlson 99d1a9861a cache: convert struct cache_entry to use struct object_id
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:

@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash

@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
Jacob Keller fd47ae6a5b diff: teach diff to display submodule difference with an inline diff
Teach git-diff and friends a new format for displaying the difference
of a submodule. The new format is an inline diff of the contents of the
submodule between the commit range of the update. This allows the user
to see the actual code change caused by a submodule update.

Add tests for the new format and option.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-31 18:07:10 -07:00
Jacob Keller 602a283afb submodule: convert show_submodule_summary to use struct object_id *
Since we're going to be changing this function in a future patch, lets
go ahead and convert this to use object_id now.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-31 18:07:10 -07:00
Jacob Keller 61cfbc054d diff: prepare for additional submodule formats
A future patch will add a new format for displaying the difference of
a submodule. Make it easier by changing how we store the current
selected format. Replace the DIFF_OPT flag with an enumeration, as each
format will be mutually exclusive.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-31 18:07:09 -07:00
Jacob Keller 660e113ce1 graph: add support for --line-prefix on all graph-aware output
Add an extension to git-diff and git-log (and any other graph-aware
displayable output) such that "--line-prefix=<string>" will print the
additional line-prefix on every line of output.

To make this work, we have to fix a few bugs in the graph API that force
graph_show_commit_msg to be used only when you have a valid graph.
Additionally, we extend the default_diff_output_prefix handler to work
even when no graph is enabled.

This is somewhat of a hack on top of the graph API, but I think it
should be acceptable here.

This will be used by a future extension of submodule display which
displays the submodule diff as the actual diff between the pre and post
commit in the submodule project.

Add some tests for both git-log and git-diff to ensure that the prefix
is honored correctly.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-31 18:07:09 -07:00
Junio C Hamano cd48dadb8d diff.c: remove output_prefix_length field
"diff/log --stat" has a logic that determines the display columns
available for the diffstat part of the output and apportions it for
pathnames and diffstat graph automatically.

5e71a84a (Add output_prefix_length to diff_options, 2012-04-16)
added the output_prefix_length field to diff_options structure to
allow this logic to subtract the display columns used for the
history graph part from the total "terminal width"; this matters
when the "git log --graph -p" option is in use.

The field must be set to the number of display columns needed to
show the output from the output_prefix() callback, which is error
prone.  As there is only one user of the field, and the user has the
actual value of the prefix string, let's get rid of the field and
have the user count the display width itself.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-31 18:07:08 -07:00
Junio C Hamano dd610aeda6 Merge branch 'kw/patch-ids-optim'
When "git rebase" tries to compare set of changes on the updated
upstream and our own branch, it computes patch-id for all of these
changes and attempts to find matches. This has been optimized by
lazily computing the full patch-id (which is expensive) to be
compared only for changes that touch the same set of paths.

* kw/patch-ids-optim:
  rebase: avoid computing unnecessary patch IDs
  patch-ids: add flag to create the diff patch id using header only data
  patch-ids: replace the seen indicator with a commit pointer
  patch-ids: stop using a hand-rolled hashmap implementation
2016-08-12 09:47:39 -07:00
Junio C Hamano cee6c5b47b Merge branch 'jk/diff-do-not-reuse-wtf-needs-cleaning' into maint
There is an optimization used in "git diff $treeA $treeB" to borrow
an already checked-out copy in the working tree when it is known to
be the same as the blob being compared, expecting that open/mmap of
such a file is faster than reading it from the object store, which
involves inflating and applying delta.  This however kicked in even
when the checked-out copy needs to go through the convert-to-git
conversion (including the clean filter), which defeats the whole
point of the optimization.  The optimization has been disabled when
the conversion is necessary.

* jk/diff-do-not-reuse-wtf-needs-cleaning:
  diff: do not reuse worktree files that need "clean" conversion
2016-08-10 11:55:28 -07:00
Junio C Hamano 767da54bf8 Merge branch 'jk/diff-do-not-reuse-wtf-needs-cleaning'
There is an optimization used in "git diff $treeA $treeB" to borrow
an already checked-out copy in the working tree when it is known to
be the same as the blob being compared, expecting that open/mmap of
such a file is faster than reading it from the object store, which
involves inflating and applying delta.  This however kicked in even
when the checked-out copy needs to go through the convert-to-git
conversion (including the clean filter), which defeats the whole
point of the optimization.  The optimization has been disabled when
the conversion is necessary.

* jk/diff-do-not-reuse-wtf-needs-cleaning:
  diff: do not reuse worktree files that need "clean" conversion
2016-08-03 15:10:29 -07:00
Kevin Willford 3e8e32c32e patch-ids: add flag to create the diff patch id using header only data
This will allow a diff patch id to be created using only the header data
so that the contents of the file will not have to be loaded.

Signed-off-by: Kevin Willford <kcwillford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-29 14:10:01 -07:00
Jeff King 06dec439a3 diff: do not reuse worktree files that need "clean" conversion
When accessing a blob for a diff, we may try to reuse file
contents in the working tree, under the theory that it is
faster to mmap those file contents than it would be to
extract the content from the object database.

When we have to filter those contents, though, that
assumption does not hold. Even for our internal conversions
like CRLF, we have to allocate and fill a new buffer anyway.
But much worse, for external clean filters we have to exec
an arbitrary script, and we have no idea how expensive it
may be to run.

So let's skip this optimization when conversion into git's
"clean" form is required. This applies whenever the
"want_file" flag is false. When it's true, the caller
actually wants the smudged worktree contents, which the
reused file by definition already has (in fact, this is a
key optimization going the other direction, since reusing
the worktree file there lets us skip smudge filters).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-22 12:31:24 -07:00
Junio C Hamano a63d31b4d3 Merge branch 'bc/cocci'
Conversion from unsigned char sha1[20] to struct object_id
continues.

* bc/cocci:
  diff: convert prep_temp_blob() to struct object_id
  merge-recursive: convert merge_recursive_generic() to object_id
  merge-recursive: convert leaf functions to use struct object_id
  merge-recursive: convert struct merge_file_info to object_id
  merge-recursive: convert struct stage_data to use object_id
  diff: rename struct diff_filespec's sha1_valid member
  diff: convert struct diff_filespec to struct object_id
  coccinelle: apply object_id Coccinelle transformations
  coccinelle: convert hashcpy() with null_sha1 to hashclr()
  contrib/coccinelle: add basic Coccinelle transforms
  hex: add oid_to_hex_r()
2016-07-19 13:22:16 -07:00
brian m. carlson 09bdff29e1 diff: convert prep_temp_blob() to struct object_id
All of the callers of this function use struct object_id, so convert it
to use struct object_id in its arguments and internally.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-28 11:39:02 -07:00
brian m. carlson 41c9560ee5 diff: rename struct diff_filespec's sha1_valid member
Now that this struct's sha1 member is called "oid", update the comment
and the sha1_valid member to be called "oid_valid" instead.  The
following Coccinelle semantic patch was used to implement this, followed
by the transformations in object_id.cocci:

@@
struct diff_filespec o;
@@
- o.sha1_valid
+ o.oid_valid

@@
struct diff_filespec *p;
@@
- p->sha1_valid
+ p->oid_valid

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-28 11:39:02 -07:00
brian m. carlson a0d12c4433 diff: convert struct diff_filespec to struct object_id
Convert struct diff_filespec's sha1 member to use a struct object_id
called "oid" instead.  The following Coccinelle semantic patch was used
to implement this, followed by the transformations in object_id.cocci:

@@
struct diff_filespec o;
@@
- o.sha1
+ o.oid.hash

@@
struct diff_filespec *p;
@@
- p->sha1
+ p->oid.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-28 11:39:02 -07:00
brian m. carlson f449198e58 coccinelle: convert hashcpy() with null_sha1 to hashclr()
hashcpy with null_sha1 as the source is equivalent to hashclr.  In
addition to being simpler, using hashclr may give the compiler a chance
to optimize better.  Convert instances of hashcpy with the source
argument of null_sha1 to hashclr.

This transformation was implemented using the following semantic patch:

@@
expression E1;
@@
-hashcpy(E1, null_sha1);
+hashclr(E1);

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-28 11:39:02 -07:00
Johannes Schindelin afc676f2c9 diff: do not color output when --color=auto and --output=<file> is given
"git diff --output=<file> --color=auto" used to show the ANSI color
sequence in the resulting file when the standard output is connected
to a terminal, because --color=auto check always checks the standard
output, not the actual file that receives the output.

We could correct this by using freopen(3) to redirect the standard
output to the specified file, which is in like with how format-patch
used to match the world order, but following the same reasoning as
the earlier "format-patch: explicitly switch off color when writing
to files", let's be more strict by bypassing the "auto" check when
the --output=<file> option is in use.

Strictly speaking, this is a backwards-incompatible change, but
it is highly unlikely that any user would want to see ANSI color
sequences in a file.

The reason this was not caught earlier is most likely that either
--output=<file> is not used, or only when stdout is redirected
anyway.

Users can still give --color=always if they want a colored diff in
the resulting file.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-28 11:26:47 -07:00
Junio C Hamano e5f7675544 Merge branch 'jk/diff-compact-heuristic'
It turns out that the earlier effort to update the heuristics may
want to use a bit more time to mature.  Turn it off by default.

* jk/diff-compact-heuristic:
  diff: disable compaction heuristic for now
2016-06-10 15:26:06 -07:00
Junio C Hamano 5580b271af diff: disable compaction heuristic for now
http://lkml.kernel.org/g/20160610075043.GA13411@sigill.intra.peff.net
reports that a change to add a new "function" with common ending
with the existing one at the end of the file is shown like this:

    def foo
      do_foo_stuff()

   +  common_ending()
   +end
   +
   +def bar
   +  do_bar_stuff()
   +
      common_ending()
    end

when the new heuristic is in use.  In reality, the change is to add
the blank line before "def bar" and everything below, which is what
the code without the new heuristic shows.

Disable the heuristics by default, and resurrect the documentation
for the option and the configuration variables, while clearly
marking the feature as still experimental.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-10 13:45:23 -07:00
Junio C Hamano 0018da1088 Merge branch 'jk/diff-compact-heuristic'
Patch output from "git diff" and friends has been tweaked to be
more readable by using a blank line as a strong hint that the
contents before and after it belong to a logically separate unit.

* jk/diff-compact-heuristic:
  diff: undocument the compaction heuristic knobs for experimentation
  xdiff: implement empty line chunk heuristic
  xdiff: add recs_match helper function
2016-05-06 14:45:46 -07:00
Stefan Beller d634d61ed6 xdiff: implement empty line chunk heuristic
In order to produce the smallest possible diff and combine several diff
hunks together, we implement a heuristic from GNU Diff which moves diff
hunks forward as far as possible when we find common context above and
below a diff hunk. This sometimes produces less readable diffs when
writing C, Shell, or other programming languages, ie:

...
 /*
+ *
+ *
+ */
+
+/*
...

instead of the more readable equivalent of

...
+/*
+ *
+ *
+ */
+
 /*
...

Implement the following heuristic to (optionally) produce the desired
output.

  If there are diff chunks which can be shifted around, shift each hunk
  such that the last common empty line is below the chunk with the rest
  of the context above.

This heuristic appears to resolve the above example and several other
common issues without producing significantly weird results. However, as
with any heuristic it is not really known whether this will always be
more optimal. Thus, it can be disabled via diff.compactionHeuristic.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-19 10:53:34 -07:00
Junio C Hamano 5d2a30d7d8 Merge branch 'mm/diff-renames-default'
The end-user facing Porcelain level commands like "diff" and "log"
now enables the rename detection by default.

* mm/diff-renames-default:
  diff: activate diff.renames by default
  log: introduce init_log_defaults()
  t: add tests for diff.renames (true/false/unset)
  t4001-diff-rename: wrap file creations in a test
  Documentation/diff-config: fix description of diff.renames
2016-04-03 10:29:22 -07:00
Junio C Hamano 11529ecec9 Merge branch 'jk/tighten-alloc'
Update various codepaths to avoid manually-counted malloc().

* jk/tighten-alloc: (22 commits)
  ewah: convert to REALLOC_ARRAY, etc
  convert ewah/bitmap code to use xmalloc
  diff_populate_gitlink: use a strbuf
  transport_anonymize_url: use xstrfmt
  git-compat-util: drop mempcpy compat code
  sequencer: simplify memory allocation of get_message
  test-path-utils: fix normalize_path_copy output buffer size
  fetch-pack: simplify add_sought_entry
  fast-import: simplify allocation in start_packfile
  write_untracked_extension: use FLEX_ALLOC helper
  prepare_{git,shell}_cmd: use argv_array
  use st_add and st_mult for allocation size computation
  convert trivial cases to FLEX_ARRAY macros
  use xmallocz to avoid size arithmetic
  convert trivial cases to ALLOC_ARRAY
  convert manual allocations to argv_array
  argv-array: add detach function
  add helpers for allocating flex-array structs
  harden REALLOC_ARRAY and xcalloc against size_t overflow
  tree-diff: catch integer overflow in combine_diff_path allocation
  ...
2016-02-26 13:37:16 -08:00
Junio C Hamano 3ed26a44b3 Merge branch 'jk/more-comments-on-textconv'
The memory ownership rule of fill_textconv() API, which was a bit
tricky, has been documented a bit better.

* jk/more-comments-on-textconv:
  diff: clarify textconv interface
2016-02-26 13:37:15 -08:00
Matthieu Moy 5404c116aa diff: activate diff.renames by default
Rename detection is a very convenient feature, and new users shouldn't
have to dig in the documentation to benefit from it.

Potential objections to activating rename detection are that it
sometimes fail, and it is sometimes slow. But rename detection is
already activated by default in several cases like "git status" and "git
merge", so activating diff.renames does not fundamentally change the
situation. When the rename detection fails, it now fails consistently
between "git diff" and "git status".

This setting does not affect plumbing commands, hence well-written
scripts will not be affected.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-25 11:31:02 -08:00
Jeff King b1ddfb9151 diff_populate_gitlink: use a strbuf
We allocate 100 bytes to hold the "Submodule commit ..."
text. This is enough, but it's not immediately obvious that
this is the case, and we have to repeat the magic 100 twice.

We could get away with xstrfmt here, but we want to know the
size, as well, so let's use a real strbuf. And while we're
here, we can clean up the logic around size_only. It
currently sets and clears the "data" field pointlessly, and
leaves the "should_free" flag on even after we have cleared
the data.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22 14:51:09 -08:00
Jeff King 96ffc06f72 convert trivial cases to FLEX_ARRAY macros
Using FLEX_ARRAY macros reduces the amount of manual
computation size we have to do. It also ensures we don't
overflow size_t, and it makes sure we write the same number
of bytes that we allocated.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22 14:51:09 -08:00
Jeff King a64e6a44c6 diff: clarify textconv interface
The memory allocation scheme for the textconv interface is a
bit tricky, and not well documented. It was originally
designed as an internal part of diff.c (matching
fill_mmfile), but gradually was made public.

Refactoring it is difficult, but we can at least improve the
situation by documenting the intended flow and enforcing it
with an in-code assertion.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22 10:40:35 -08:00
Junio C Hamano 02dab5d399 Merge branch 'nd/diff-with-path-params' into maint
A few options of "git diff" did not work well when the command was
run from a subdirectory.

* nd/diff-with-path-params:
  diff: make -O and --output work in subdirectory
  diff-no-index: do not take a redundant prefix argument
2016-02-05 14:54:15 -08:00
Junio C Hamano c167a96e68 Merge branch 'nd/diff-with-path-params'
A few options of "git diff" did not work well when the command was
run from a subdirectory.

* nd/diff-with-path-params:
  diff: make -O and --output work in subdirectory
  diff-no-index: do not take a redundant prefix argument
2016-02-03 14:16:04 -08:00
Duy Nguyen a97262c62f diff: make -O and --output work in subdirectory
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-21 10:45:13 -08:00
Junio C Hamano 433cc7e3fb Merge branch 'tk/sigchain-unnecessary-post-tempfile'
Remove no-longer used #include.

* tk/sigchain-unnecessary-post-tempfile:
  shallow: remove unused #include "sigchain.h"
  read-cache: remove unused #include "sigchain.h"
  diff: remove unused #include "sigchain.h"
  credential-cache--daemon: remove unused #include "sigchain.h"
2015-10-29 13:59:18 -07:00
Tobias Klauser 086ecab1a7 diff: remove unused #include "sigchain.h"
After switching to use the tempfile module in commit 284098f1
(diff: use tempfile module), no declarations from sigchain.h are used in
diff.c anymore. Thus, remove the #include.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-22 11:12:37 -07:00
Junio C Hamano 78891795df Merge branch 'jk/war-on-sprintf'
Many allocations that is manually counted (correctly) that are
followed by strcpy/sprintf have been replaced with a less error
prone constructs such as xstrfmt.

Macintosh-specific breakage was noticed and corrected in this
reroll.

* jk/war-on-sprintf: (70 commits)
  name-rev: use strip_suffix to avoid magic numbers
  use strbuf_complete to conditionally append slash
  fsck: use for_each_loose_file_in_objdir
  Makefile: drop D_INO_IN_DIRENT build knob
  fsck: drop inode-sorting code
  convert strncpy to memcpy
  notes: document length of fanout path with a constant
  color: add color_set helper for copying raw colors
  prefer memcpy to strcpy
  help: clean up kfmclient munging
  receive-pack: simplify keep_arg computation
  avoid sprintf and strcpy with flex arrays
  use alloc_ref rather than hand-allocating "struct ref"
  color: add overflow checks for parsing colors
  drop strcpy in favor of raw sha1_to_hex
  use sha1_to_hex_r() instead of strcpy
  daemon: use cld->env_array when re-spawning
  stat_tracking_info: convert to argv_array
  http-push: use an argv_array for setup_revisions
  fetch-pack: use argv_array for index-pack / unpack-objects
  ...
2015-10-20 15:24:01 -07:00
Jeff King d59f765ac9 use sha1_to_hex_r() instead of strcpy
Before sha1_to_hex_r() existed, a simple way to get hex
sha1 into a buffer was with:

  strcpy(buf, sha1_to_hex(sha1));

This isn't wrong (assuming the buf is 41 characters), but it
makes auditing the code base for bad strcpy() calls harder,
as these become false positives.

Let's convert them to sha1_to_hex_r(), and likewise for
some calls to find_unique_abbrev(). While we're here, we'll
double-check that all of the buffers are correctly sized,
and use the more obvious GIT_SHA1_HEXSZ constant.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-05 11:08:05 -07:00
Junio C Hamano 3adc4ec7b9 Sync with v2.5.4 2015-09-28 19:16:54 -07:00
Junio C Hamano 11a458befc Sync with 2.4.10 2015-09-28 15:33:56 -07:00
Junio C Hamano 6343e2f6f2 Sync with 2.3.10 2015-09-28 15:28:31 -07:00
Jeff King 3efb988098 react to errors in xdi_diff
When we call into xdiff to perform a diff, we generally lose
the return code completely. Typically by ignoring the return
of our xdi_diff wrapper, but sometimes we even propagate
that return value up and then ignore it later.  This can
lead to us silently producing incorrect diffs (e.g., "git
log" might produce no output at all, not even a diff header,
for a content-level diff).

In practice this does not happen very often, because the
typical reason for xdiff to report failure is that it
malloc() failed (it uses straight malloc, and not our
xmalloc wrapper).  But it could also happen when xdiff
triggers one our callbacks, which returns an error (e.g.,
outf() in builtin/rerere.c tries to report a write failure
in this way). And the next patch also plans to add more
failure modes.

Let's notice an error return from xdiff and react
appropriately. In most of the diff.c code, we can simply
die(), which matches the surrounding code (e.g., that is
what we do if we fail to load a file for diffing in the
first place). This is not that elegant, but we are probably
better off dying to let the user know there was a problem,
rather than simply generating bogus output.

We could also just die() directly in xdi_diff, but the
callers typically have a bit more context, and can provide a
better message (and if we do later decide to pass errors up,
we're one step closer to doing so).

There is one interesting case, which is in diff_grep(). Here
if we cannot generate the diff, there is nothing to match,
and we silently return "no hits". This is actually what the
existing code does already, but we make it a little more
explicit.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-28 14:57:10 -07:00
Jeff King 5096d4909f convert trivial sprintf / strcpy calls to xsnprintf
We sometimes sprintf into fixed-size buffers when we know
that the buffer is large enough to fit the input (either
because it's a constant, or because it's numeric input that
is bounded in size). Likewise with strcpy of constant
strings.

However, these sites make it hard to audit sprintf and
strcpy calls for buffer overflows, as a reader has to
cross-reference the size of the array with the input. Let's
use xsnprintf instead, which communicates to a reader that
we don't expect this to overflow (and catches the mistake in
case we do).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 10:18:18 -07:00
Junio C Hamano 5a4f07b322 Merge branch 'hv/submodule-config'
The gitmodules API accessed from the C code learned to cache stuff
lazily.

* hv/submodule-config:
  submodule: allow erroneous values for the fetchRecurseSubmodules option
  submodule: use new config API for worktree configurations
  submodule: extract functions for config set and lookup
  submodule: implement a config API for lookup of .gitmodules values
2015-08-31 15:38:52 -07:00
Junio C Hamano db86e61cbb Merge branch 'mh/tempfile'
The "lockfile" API has been rebuilt on top of a new "tempfile" API.

* mh/tempfile:
  credential-cache--daemon: use tempfile module
  credential-cache--daemon: delete socket from main()
  gc: use tempfile module to handle gc.pid file
  lock_repo_for_gc(): compute the path to "gc.pid" only once
  diff: use tempfile module
  setup_temporary_shallow(): use tempfile module
  write_shared_index(): use tempfile module
  register_tempfile(): new function to handle an existing temporary file
  tempfile: add several functions for creating temporary files
  prepare_tempfile_object(): new function, extracted from create_tempfile()
  tempfile: a new module for handling temporary files
  commit_lock_file(): use get_locked_file_path()
  lockfile: add accessor get_lock_file_path()
  lockfile: add accessors get_lock_file_fd() and get_lock_file_fp()
  create_bundle(): duplicate file descriptor to avoid closing it twice
  lockfile: move documentation to lockfile.h and lockfile.c
2015-08-25 14:57:09 -07:00
Heiko Voigt 851e18c385 submodule: use new config API for worktree configurations
We remove the extracted functions and directly parse into and read out
of the cache. This allows us to have one unified way of accessing
submodule configuration values specific to single submodules. Regardless
whether we need to access a configuration from history or from the
worktree.

Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-19 11:43:10 -07:00
Michael Haggerty 284098f13f diff: use tempfile module
Also add some code comments explaining how the fields in "struct
diff_tempfile" are used.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-12 14:49:43 -07:00
Junio C Hamano 2dded96052 Merge branch 'dt/log-follow-config'
Add a new configuration variable to enable "--follow" automatically
when "git log" is run with one pathspec argument.

* dt/log-follow-config:
  log: add "log.follow" configuration variable
2015-08-03 11:01:20 -07:00
Junio C Hamano abecddea25 Merge branch 'jc/diff-ws-error-highlight'
A hotfix to a new feature in 2.5.0-rc.

* jc/diff-ws-error-highlight:
  diff: parse ws-error-highlight option more strictly
2015-07-15 12:30:14 -07:00
René Scharfe 3f4f17b51b diff: parse ws-error-highlight option more strictly
Check if a matched token is followed by a delimiter before advancing the
pointer arg.  This avoids accepting composite words like "allnew" or
"defaultcontext" and misparsing them as "new" or "context".

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-12 09:55:23 -07:00
David Turner 076c98372e log: add "log.follow" configuration variable
People who work on projects with mostly linear history with frequent
whole file renames may want to always use "git log --follow" when
inspecting the life of the content that live in a single path.

Teach the command to behave as if "--follow" was given from the
command line when log.follow configuration variable is set *and*
there is one (and only one) path on the command line.

Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-09 10:24:23 -07:00
Junio C Hamano 6998d890c7 Merge branch 'jk/color-diff-plain-is-context' into maint
"color.diff.plain" was a misnomer; give it 'color.diff.context' as
a more logical synonym.

* jk/color-diff-plain-is-context:
  diff.h: rename DIFF_PLAIN color slot to DIFF_CONTEXT
  diff: accept color.diff.context as a synonym for "plain"
2015-06-25 11:02:11 -07:00
Junio C Hamano db65170ee5 Merge branch 'jk/color-diff-plain-is-context'
"color.diff.plain" was a misnomer; give it 'color.diff.context' as
a more logical synonym.

* jk/color-diff-plain-is-context:
  diff.h: rename DIFF_PLAIN color slot to DIFF_CONTEXT
  diff: accept color.diff.context as a synonym for "plain"
2015-06-11 09:29:53 -07:00
Junio C Hamano 709cd912d4 Merge branch 'jc/diff-ws-error-highlight'
Allow whitespace breakages in deleted and context lines to be also
painted in the output.

* jc/diff-ws-error-highlight:
  diff.c: --ws-error-highlight=<kind> option
  diff.c: add emit_del_line() and emit_context_line()
  t4015: separate common setup and per-test expectation
  t4015: modernise style
2015-06-11 09:29:51 -07:00
Jeff King 8dbf3eb685 diff.h: rename DIFF_PLAIN color slot to DIFF_CONTEXT
The latter is a much more descriptive name (and we support
"color.diff.context" now). This also updates the name of any
local variables which were used to store the color.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-27 13:54:42 -07:00
Jeff King 74b15bfbf6 diff: accept color.diff.context as a synonym for "plain"
The term "plain" is a bit ambiguous; let's allow the more
specific "context", but keep "plain" around for
compatibility.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-27 13:54:37 -07:00
Junio C Hamano b8767f791c diff.c: --ws-error-highlight=<kind> option
Traditionally, we only cared about whitespace breakages introduced
in new lines.  Some people want to paint whitespace breakages on old
lines, too.  When they see a whitespace breakage on a new line, they
can spot the same kind of whitespace breakage on the corresponding
old line and want to say "Ah, those breakages are there but they
were inherited from the original, so let's not touch them for now."

Introduce `--ws-error-highlight=<kind>` option, that lets them pass
a comma separated list of `old`, `new`, and `context` to specify
what lines to highlight whitespace errors on.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-26 23:00:01 -07:00
Junio C Hamano 0e383e185a diff.c: add emit_del_line() and emit_context_line()
Traditionally, we only had emit_add_line() helper, which knows how
to find and paint whitespace breakages on the given line, because we
only care about whitespace breakages introduced in new lines.  The
context lines and old (i.e. deleted) lines are emitted with a
simpler emit_line_0() that paints the entire line in plain or old
colors.

Identify callers of emit_line_0() that show deleted lines and
context lines, have them call new helpers, emit_del_line() and
emit_context_line(), so that we can later tweak what is done to
these two classes of lines.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-26 22:13:02 -07:00
Junio C Hamano a393c6bfd9 Merge branch 'rs/deflate-init-cleanup' into maint
Code simplification.

* rs/deflate-init-cleanup:
  zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}
2015-03-23 11:23:38 -07:00
Junio C Hamano 6902c4da58 Merge branch 'rs/deflate-init-cleanup'
Code simplification.

* rs/deflate-init-cleanup:
  zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}
2015-03-17 16:01:26 -07:00
Junio C Hamano a4b4f9b8e3 Merge branch 'mk/diff-shortstat-dirstat-fix' into maint
"git diff --shortstat --dirstat=changes" showed a dirstat based on
lines that was never asked by the end user in addition to the
dirstat that the user asked for.

* mk/diff-shortstat-dirstat-fix:
  diff --shortstat --dirstat: remove duplicate output
2015-03-13 22:56:04 -07:00
Junio C Hamano b6488fe191 Merge branch 'mk/diff-shortstat-dirstat-fix'
"git diff --shortstat --dirstat=changes" showed a dirstat based on
lines that was never asked by the end user in addition to the
dirstat that the user asked for.

* mk/diff-shortstat-dirstat-fix:
  diff --shortstat --dirstat: remove duplicate output
2015-03-06 15:02:29 -08:00
René Scharfe 9a6f1287fb zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}
Clear the git_zstream variable at the start of git_deflate_init() etc.
so that callers don't have to do that.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-05 15:46:03 -08:00
Mårten Kongstad ab27389aff diff --shortstat --dirstat: remove duplicate output
When --shortstat is used in conjunction with --dirstat=changes, git diff will
output the dirstat information twice: first as calculated by the 'lines'
algorithm, then as calculated by the 'changes' algorithm:

    $ git diff --dirstat=changes,10 --shortstat v2.2.0..v2.2.1
     23 files changed, 453 insertions(+), 54 deletions(-)
      33.5% Documentation/RelNotes/
      26.2% t/
      46.6% Documentation/RelNotes/
      16.6% t/

The same duplication happens for --shortstat together with --dirstat=files, but
not for --shortstat together with --dirstat=lines.

Limit output to only include one dirstat part, calculated as specified
by the --dirstat parameter. Also, add test for this.

Signed-off-by: Mårten Kongstad <marten.kongstad@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-02 11:31:27 -08:00
Junio C Hamano b946576839 Merge branch 'jn/parse-config-slot'
Code cleanup.

* jn/parse-config-slot:
  color_parse: do not mention variable name in error message
  pass config slots as pointers instead of offsets
2014-10-20 12:23:48 -07:00
Jeff King f6c5a2968c color_parse: do not mention variable name in error message
Originally the color-parsing function was used only for
config variables. It made sense to pass the variable name so
that the die() message could be something like:

  $ git -c color.branch.plain=bogus branch
  fatal: bad color value 'bogus' for variable 'color.branch.plain'

These days we call it in other contexts, and the resulting
error messages are a little confusing:

  $ git log --pretty='%C(bogus)'
  fatal: bad color value 'bogus' for variable '--pretty format'

  $ git config --get-color foo.bar bogus
  fatal: bad color value 'bogus' for variable 'command line'

This patch teaches color_parse to complain only about the
value, and then return an error code. Config callers can
then propagate that up to the config parser, which mentions
the variable name. Other callers can provide a custom
message. After this patch these three cases now look like:

  $ git -c color.branch.plain=bogus branch
  error: invalid color value: bogus
  fatal: unable to parse 'color.branch.plain' from command-line config

  $ git log --pretty='%C(bogus)'
  error: invalid color value: bogus
  fatal: unable to parse --pretty format

  $ git config --get-color foo.bar bogus
  error: invalid color value: bogus
  fatal: unable to parse default color value

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-14 11:01:21 -07:00
Junio C Hamano bedd3b4b7b Merge branch 'nd/large-blobs'
Teach a few codepaths to punt (instead of dying) when large blobs
that would not fit in core are involved in the operation.

* nd/large-blobs:
  diff: shortcut for diff'ing two binary SHA-1 objects
  diff --stat: mark any file larger than core.bigfilethreshold binary
  diff.c: allow to pass more flags to diff_populate_filespec
  sha1_file.c: do not die failing to malloc in unpack_compressed_entry
  wrapper.c: introduce gentle xmallocz that does not die()
2014-09-11 10:33:33 -07:00
René Scharfe d318027932 run-command: introduce CHILD_PROCESS_INIT
Most struct child_process variables are cleared using memset first after
declaration.  Provide a macro, CHILD_PROCESS_INIT, that can be used to
initialize them statically instead.  That's shorter, doesn't require a
function call and is slightly more readable (especially given that we
already have STRBUF_INIT, ARGV_ARRAY_INIT etc.).

Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-20 09:53:37 -07:00
Nguyễn Thái Ngọc Duy 1aaf69e669 diff: shortcut for diff'ing two binary SHA-1 objects
If we are given two SHA-1 and asked to determine if they are different
(but not _what_ differences), we know right away by comparing SHA-1.

A side effect of this patch is, because large files are marked binary,
diff-tree will not need to unpack them. 'diff-index --cached' will not
either. But 'diff-files' still does.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-18 10:16:55 -07:00
Nguyễn Thái Ngọc Duy 6bf3b81348 diff --stat: mark any file larger than core.bigfilethreshold binary
Too large files may lead to failure to allocate memory. If it happens
here, it could impact quite a few commands that involve
diff. Moreover, too large files are inefficient to compare anyway (and
most likely non-text), so mark them binary and skip looking at their
content.

Noticed-by: Dale R. Worley <worley@alum.mit.edu>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-18 10:16:45 -07:00
Nguyễn Thái Ngọc Duy 8e5dd3d654 diff.c: allow to pass more flags to diff_populate_filespec
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-18 10:16:35 -07:00
Junio C Hamano cfececfe1f Merge branch 'bg/xcalloc-nmemb-then-size' into maint
* bg/xcalloc-nmemb-then-size:
  transport-helper.c: rearrange xcalloc arguments
  remote.c: rearrange xcalloc arguments
  reflog-walk.c: rearrange xcalloc arguments
  pack-revindex.c: rearrange xcalloc arguments
  notes.c: rearrange xcalloc arguments
  imap-send.c: rearrange xcalloc arguments
  http-push.c: rearrange xcalloc arguments
  diff.c: rearrange xcalloc arguments
  config.c: rearrange xcalloc arguments
  commit.c: rearrange xcalloc arguments
  builtin/remote.c: rearrange xcalloc arguments
  builtin/ls-remote.c: rearrange xcalloc arguments
2014-07-22 10:25:17 -07:00
René Scharfe cedc61a998 strbuf: use strbuf_addstr() for adding C strings
Avoid code duplication and let strbuf_addstr() call strlen() for us.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-17 13:33:52 -07:00
Junio C Hamano cb4575fb18 Merge branch 'jk/diff-follow-must-take-one-pathspec' into maint
"git format-patch" did not enforce the rule that the "--follow"
option from the log/diff family of commands must be used with
exactly one pathspec.

* jk/diff-follow-must-take-one-pathspec:
  move "--follow needs one pathspec" rule to diff_setup_done
2014-06-25 11:47:23 -07:00
Jeff King 0539cc0038 stat_opt: check extra strlen call
As in earlier commits, the diff option parser uses
starts_with to find that an argument starts with "--stat-",
and then adds strlen("stat-") to find the rest of the
option.

However, in this case the starts_with and the strlen are
separated across functions, making it easy to call the
latter without the former. Let's use skip_prefix instead of
raw pointer arithmetic to catch such a case.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20 10:45:19 -07:00
Jeff King 95b567c7c3 use skip_prefix to avoid repeating strings
It's a common idiom to match a prefix and then skip past it
with strlen, like:

  if (starts_with(foo, "bar"))
	  foo += strlen("bar");

This avoids magic numbers, but means we have to repeat the
string (and there is no compiler check that we didn't make a
typo in one of the strings).

We can use skip_prefix to handle this case without repeating
ourselves.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20 10:44:45 -07:00
Jeff King ae021d8791 use skip_prefix to avoid magic numbers
It's a common idiom to match a prefix and then skip past it
with a magic number, like:

  if (starts_with(foo, "bar"))
	  foo += 3;

This is easy to get wrong, since you have to count the
prefix string yourself, and there's no compiler check if the
string changes.  We can use skip_prefix to avoid the magic
numbers here.

Note that some of these conversions could be much shorter.
For example:

  if (starts_with(arg, "--foo=")) {
	  bar = arg + 6;
	  continue;
  }

could become:

  if (skip_prefix(arg, "--foo=", &bar))
	  continue;

However, I have left it as:

  if (skip_prefix(arg, "--foo=", &v)) {
	  bar = v;
	  continue;
  }

to visually match nearby cases which need to actually
process the string. Like:

  if (skip_prefix(arg, "--foo=", &v)) {
	  bar = atoi(v);
	  continue;
  }

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20 10:44:45 -07:00
Jeff King 9e1a5ebe52 parse_diff_color_slot: drop ofs parameter
This function originally took a whole config variable name
("var") and an offset ("ofs"). It checked "var+ofs" against
each color slot, but reported errors using the whole "var".

However, since 8b8e862 (ignore unknown color configuration,
2009-12-12), it returns -1 rather than printing its own
error, and therefore only cares about var+ofs. We can drop
the ofs parameter and teach its sole caller to derive the
pointer itself.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-18 14:56:17 -07:00
Junio C Hamano a634a6d209 Merge branch 'bg/xcalloc-nmemb-then-size'
Like calloc(3), xcalloc() takes nmemb and then size.

* bg/xcalloc-nmemb-then-size:
  transport-helper.c: rearrange xcalloc arguments
  remote.c: rearrange xcalloc arguments
  reflog-walk.c: rearrange xcalloc arguments
  pack-revindex.c: rearrange xcalloc arguments
  notes.c: rearrange xcalloc arguments
  imap-send.c: rearrange xcalloc arguments
  http-push.c: rearrange xcalloc arguments
  diff.c: rearrange xcalloc arguments
  config.c: rearrange xcalloc arguments
  commit.c: rearrange xcalloc arguments
  builtin/remote.c: rearrange xcalloc arguments
  builtin/ls-remote.c: rearrange xcalloc arguments
2014-06-16 12:17:50 -07:00
Junio C Hamano b0e2c999af Merge branch 'jk/diff-follow-must-take-one-pathspec'
* jk/diff-follow-must-take-one-pathspec:
  move "--follow needs one pathspec" rule to diff_setup_done
2014-06-16 10:07:09 -07:00
Junio C Hamano 6779e43b0d Merge branch 'jk/external-diff-use-argv-array'
Code clean-up (and a bugfix which has been merged for 2.0).

* jk/external-diff-use-argv-array:
  run_external_diff: refactor cmdline setup logic
  run_external_diff: hoist common bits out of conditional
  run_external_diff: drop fflush(NULL)
  run_external_diff: clean up error handling
  run_external_diff: use an argv_array for the environment
2014-06-03 12:06:43 -07:00
Junio C Hamano 8eaf517835 Merge branch 'ks/tree-diff-nway'
Instead of running N pair-wise diff-trees when inspecting a
N-parent merge, find the set of paths that were touched by walking
N+1 trees in parallel.  These set of paths can then be turned into
N pair-wise diff-tree results to be processed through rename
detections and such.  And N=2 case nicely degenerates to the usual
2-way diff-tree, which is very nice.

* ks/tree-diff-nway:
  mingw: activate alloca
  combine-diff: speed it up, by using multiparent diff tree-walker directly
  tree-diff: rework diff_tree() to generate diffs for multiparent cases as well
  Portable alloca for Git
  tree-diff: reuse base str(buf) memory on sub-tree recursion
  tree-diff: no need to call "full" diff_tree_sha1 from show_path()
  tree-diff: rework diff_tree interface to be sha1 based
  tree-diff: diff_tree() should now be static
  tree-diff: remove special-case diff-emitting code for empty-tree cases
  tree-diff: simplify tree_entry_pathcmp
  tree-diff: show_path prototype is not needed anymore
  tree-diff: rename compare_tree_entry -> tree_entry_pathcmp
  tree-diff: move all action-taking code out of compare_tree_entry()
  tree-diff: don't assume compare_tree_entry() returns -1,0,1
  tree-diff: consolidate code for emitting diffs and recursion in one place
  tree-diff: show_tree() is not needed
  tree-diff: no need to pass match to skip_uninteresting()
  tree-diff: no need to manually verify that there is no mode change for a path
  combine-diff: move changed-paths scanning logic into its own function
  combine-diff: move show_log_first logic/action out of paths scanning
2014-06-03 12:06:40 -07:00
Brian Gesiak 1a4927c5c5 diff.c: rearrange xcalloc arguments
xcalloc() takes two arguments: the number of elements and their size.
diffstat_add() passes the arguments in reverse order, passing the
size of a diffstat_file*, followed by the number of diffstat_file* to
be allocated.

Rearrange them so they are in the correct order.

Signed-off-by: Brian Gesiak <modocache@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-27 14:02:03 -07:00
Jeff King dd63f169d9 move "--follow needs one pathspec" rule to diff_setup_done
Because of the way "--follow" is implemented, we must have
exactly one pathspec. "git log" enforces this restriction,
but other users of the revision traversal code do not. For
example, "git format-patch --follow" will segfault during
try_to_follow_renames, as we have no pathspecs at all.

We can push this check down into diff_setup_done, which is
probably a better place anyway. It is the diff code that
introduces this restriction, so other parts of the code
should not need to care themselves.

Reported-by: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-20 11:09:03 -07:00
Junio C Hamano 5f11a7aad0 Merge branch 'jk/external-diff-use-argv-array' (early part)
Crash fix for codepath that miscounted the necessary size for an
array when spawning an external diff program.

* 'jk/external-diff-use-argv-array' (early part):
  run_external_diff: use an argv_array for the command line
2014-04-28 15:47:35 -07:00
Jeff King f3efe78782 run_external_diff: refactor cmdline setup logic
The current logic makes it hard to see what gets put onto
the command line in which cases. Pulling out a helper
function lets us see that we have two sets of file data, and
the second set either uses the original name, or the "other"
renamed/copy name.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-21 10:32:19 -07:00
Jeff King 0d4217d92e run_external_diff: hoist common bits out of conditional
Whether we have diff_filespecs to give to the diff command
or not, we always are going to run the program and pass it
the pathname. Let's pull that duplicated part out of the
conditional to make it more obvious.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-21 10:32:07 -07:00
Jeff King 5b88caa417 run_external_diff: drop fflush(NULL)
This fflush was added in d5535ec (Use run_command() to spawn
external diff programs instead of fork/exec., 2007-10-19),
because flushing buffers before forking is a good habit.

But later, 7d0b18a (Add output flushing before fork(),
2008-08-04) added it to the generic run-command interface,
meaning that our flush here is redundant.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-21 10:31:51 -07:00
Jeff King 89294d143d run_external_diff: clean up error handling
When the external diff reports an error, we try to clean up
and die. However, we can make this process a bit simpler:

  1. We do not need to bother freeing memory, since we are
     about to exit.  Nor do we need to clean up our
     tempfiles, since the atexit() handler will do it for
     us. So we can die as soon as we see the error.

  3. We can just call die() rather than fprintf/exit. This
     does technically change our exit code, but the exit
     code of "1" is not meaningful here. In fact, it is
     probably wrong, since "1" from diff usually means
     "completed successfully, but there were differences".

And while we're there, we can mark the error message for
translation, and drop the full stop at the end to make it
more like our other messages.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-21 10:31:36 -07:00
Jeff King ae049c955c run_external_diff: use an argv_array for the environment
We currently use static buffers and a static array for
formatting the environment passed to the external diff.
There's nothing wrong in the code, but it is much easier to
verify that it is correct if we use a dynamic argv_array.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-21 10:30:33 -07:00
Jeff King 82fbf269b9 run_external_diff: use an argv_array for the command line
We currently generate the command-line for the external
command using a fixed-length array of size 10. But if there
is a rename, we actually need 11 elements (10 items, plus a
NULL), and end up writing a random NULL onto the stack.

Rather than bump the limit, let's just use an argv_array, which
makes this sort of error impossible.

Noticed-by: Max L <infthi.inbox@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-21 10:29:50 -07:00
Jiang Xin d1d96a82bb i18n: remove obsolete comments for translators in diffstat generation
Since we do not translate diffstat any more, remove the obsolete comments.

Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-17 11:09:56 -07:00
Junio C Hamano d59c12d7ad Merge branch 'jl/nor-or-nand-and'
Eradicate mistaken use of "nor" (that is, essentially "nor" used
not in "neither A nor B" ;-)) from in-code comments, command output
strings, and documentations.

* jl/nor-or-nand-and:
  code and test: fix misuses of "nor"
  comments: fix misuses of "nor"
  contrib: fix misuses of "nor"
  Documentation: fix misuses of "nor"
2014-04-08 12:00:28 -07:00
Kirill Smelkov 7195fbfaf5 combine-diff: speed it up, by using multiparent diff tree-walker directly
As was recently shown in "combine-diff: optimize
combine_diff_path sets intersection", combine-diff runs very slowly. In
that commit we optimized paths sets intersection, but that accounted
only for ~ 25% of the slowness, and as my tracing showed, for linux.git
v3.10..v3.11, for merges a lot of time is spent computing
diff(commit,commit^2) just to only then intersect that huge diff to
almost small set of files from diff(commit,commit^1).

In previous commit, we described the problem in more details, and
reworked the diff tree-walker to be general one - i.e. to work in
multiple parent case too. Now is the time to take advantage of it for
finding paths for combine diff.

The implementation is straightforward - if we know, we can get generated
diff paths directly, and at present that means no diff filtering or
rename/copy detection was requested(*), we can call multiparent tree-walker
directly and get ready paths.

(*) because e.g. at present, all diffcore transformations work on
    diff_filepair queues, but in the future, that limitation can be
    lifted, if filters would operate directly on combine_diff_paths.

Timings for `git log --raw --no-abbrev --no-renames` without `-c` ("git log")
and with `-c` ("git log -c") and with `-c --merges` ("git log -c --merges")
before and after the patch are as follows:

                linux.git v3.10..v3.11

            log     log -c     log -c --merges

    before  1.9s    16.4s      15.2s
    after   1.9s     2.4s       1.1s

The result stayed the same.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-07 14:41:49 -07:00
Kirill Smelkov 72441af7c4 tree-diff: rework diff_tree() to generate diffs for multiparent cases as well
Previously diff_tree(), which is now named ll_diff_tree_sha1(), was
generating diff_filepair(s) for two trees t1 and t2, and that was
usually used for a commit as t1=HEAD~, and t2=HEAD - i.e. to see changes
a commit introduces.

In Git, however, we have fundamentally built flexibility in that a
commit can have many parents - 1 for a plain commit, 2 for a simple merge,
but also more than 2 for merging several heads at once.

For merges there is a so called combine-diff, which shows diff, a merge
introduces by itself, omitting changes done by any parent. That works
through first finding paths, that are different to all parents, and then
showing generalized diff, with separate columns for +/- for each parent.
The code lives in combine-diff.c .

There is an impedance mismatch, however, in that a commit could
generally have any number of parents, and that while diffing trees, we
divide cases for 2-tree diffs and more-than-2-tree diffs. I mean there
is no special casing for multiple parents commits in e.g.
revision-walker .

That impedance mismatch *hurts* *performance* *badly* for generating
combined diffs - in "combine-diff: optimize combine_diff_path
sets intersection" I've already removed some slowness from it, but from
the timings provided there, it could be seen, that combined diffs still
cost more than an order of magnitude more cpu time, compared to diff for
usual commits, and that would only be an optimistic estimate, if we take
into account that for e.g. linux.git there is only one merge for several
dozens of plain commits.

That slowness comes from the fact that currently, while generating
combined diff, a lot of time is spent computing diff(commit,commit^2)
just to only then intersect that huge diff to almost small set of files
from diff(commit,commit^1).

That's because at present, to compute combine-diff, for first finding
paths, that "every parent touches", we use the following combine-diff
property/definition:

D(A,P1...Pn) = D(A,P1) ^ ... ^ D(A,Pn)      (w.r.t. paths)

where

D(A,P1...Pn) is combined diff between commit A, and parents Pi

and

D(A,Pi) is usual two-tree diff Pi..A

So if any of that D(A,Pi) is huge, tracting 1 n-parent combine-diff as n
1-parent diffs and intersecting results will be slow.

And usually, for linux.git and other topic-based workflows, that
D(A,P2) is huge, because, if merge-base of A and P2, is several dozens
of merges (from A, via first parent) below, that D(A,P2) will be diffing
sum of merges from several subsystems to 1 subsystem.

The solution is to avoid computing n 1-parent diffs, and to find
changed-to-all-parents paths via scanning A's and all Pi's trees
simultaneously, at each step comparing their entries, and based on that
comparison, populate paths result, and deduce we could *skip*
*recursing* into subdirectories, if at least for 1 parent, sha1 of that
dir tree is the same as in A. That would save us from doing significant
amount of needless work.

Such approach is very similar to what diff_tree() does, only there we
deal with scanning only 2 trees simultaneously, and for n+1 tree, the
logic is a bit more complex:

D(T,P1...Pn) calculation scheme
-------------------------------

D(T,P1...Pn) = D(T,P1) ^ ... ^ D(T,Pn)	(regarding resulting paths set)

    D(T,Pj)		- diff between T..Pj
    D(T,P1...Pn)	- combined diff from T to parents P1,...,Pn

We start from all trees, which are sorted, and compare their entries in
lock-step:

     T     P1       Pn
     -     -        -
    |t|   |p1|     |pn|
    |-|   |--| ... |--|      imin = argmin(p1...pn)
    | |   |  |     |  |
    |-|   |--|     |--|
    |.|   |. |     |. |
     .     .        .
     .     .        .

at any time there could be 3 cases:

    1)  t < p[imin];
    2)  t > p[imin];
    3)  t = p[imin].

Schematic deduction of what every case means, and what to do, follows:

1)  t < p[imin]  ->  ∀j t ∉ Pj  ->  "+t" ∈ D(T,Pj)  ->  D += "+t";  t↓

2)  t > p[imin]

    2.1) ∃j: pj > p[imin]  ->  "-p[imin]" ∉ D(T,Pj)  ->  D += ø;  ∀ pi=p[imin]  pi↓
    2.2) ∀i  pi = p[imin]  ->  pi ∉ T  ->  "-pi" ∈ D(T,Pi)  ->  D += "-p[imin]";  ∀i pi↓

3)  t = p[imin]

    3.1) ∃j: pj > p[imin]  ->  "+t" ∈ D(T,Pj)  ->  only pi=p[imin] remains to investigate
    3.2) pi = p[imin]  ->  investigate δ(t,pi)
     |
     |
     v

    3.1+3.2) looking at δ(t,pi) ∀i: pi=p[imin] - if all != ø  ->

                      ⎧δ(t,pi)  - if pi=p[imin]
             ->  D += ⎨
                      ⎩"+t"     - if pi>p[imin]

    in any case t↓  ∀ pi=p[imin]  pi↓

~

For comparison, here is how diff_tree() works:

D(A,B) calculation scheme
-------------------------

    A     B
    -     -
   |a|   |b|    a < b   ->  a ∉ B   ->   D(A,B) +=  +a    a↓
   |-|   |-|    a > b   ->  b ∉ A   ->   D(A,B) +=  -b    b↓
   | |   | |    a = b   ->  investigate δ(a,b)            a↓ b↓
   |-|   |-|
   |.|   |.|
    .     .
    .     .

~~~~~~~~

This patch generalizes diff tree-walker to work with arbitrary number of
parents as described above - i.e. now there is a resulting tree t, and
some parents trees tp[i] i=[0..nparent). The generalization builds on
the fact that usual diff

D(A,B)

is by definition the same as combined diff

D(A,[B]),

so if we could rework the code for common case and make it be not slower
for nparent=1 case, usual diff(t1,t2) generation will not be slower, and
multiparent diff tree-walker would greatly benefit generating
combine-diff.

What we do is as follows:

1) diff tree-walker ll_diff_tree_sha1() is internally reworked to be
   a paths generator (new name diff_tree_paths()), with each generated path
   being `struct combine_diff_path` with info for path, new sha1,mode and for
   every parent which sha1,mode it was in it.

2) From that info, we can still generate usual diff queue with
   struct diff_filepairs, via "exporting" generated
   combine_diff_path, if we know we run for nparent=1 case.
   (see emit_diff() which is now named emit_diff_first_parent_only())

3) In order for diff_can_quit_early(), which checks

       DIFF_OPT_TST(opt, HAS_CHANGES))

   to work, that exporting have to be happening not in bulk, but
   incrementally, one diff path at a time.

   For such consumers, there is a new callback in diff_options
   introduced:

       ->pathchange(opt, struct combine_diff_path *)

   which, if set to !NULL, is called for every generated path.

   (see new compat ll_diff_tree_sha1() wrapper around new paths
    generator for setup)

4) The paths generation itself, is reworked from previous
   ll_diff_tree_sha1() code according to "D(A,P1...Pn) calculation
   scheme" provided above:

   On the start we allocate [nparent] arrays in place what was
   earlier just for one parent tree.

   then we just generalize loops, and comparison according to the
   algorithm.

Some notes(*):

1) alloca(), for small arrays, is used for "runs not slower for
   nparent=1 case than before" goal - if we change it to xmalloc()/free()
   the timings get ~1% worse. For alloca() we use just-introduced
   xalloca/xalloca_free compatibility wrappers, so it should not be a
   portability problem.

2) For every parent tree, we need to keep a tag, whether entry from that
   parent equals to entry from minimal parent. For performance reasons I'm
   keeping that tag in entry's mode field in unused bit - see S_IFXMIN_NEQ.
   Not doing so, we'd need to alloca another [nparent] array, which hurts
   performance.

3) For emitted paths, memory could be reused, if we know the path was
   processed via callback and will not be needed later. We use efficient
   hand-made realloc-style path_appendnew(), that saves us from ~1-1.5%
   of potential additional slowdown.

4) goto(s) are used in several places, as the code executes a little bit
   faster with lowered register pressure.

Also

- we should now check for FIND_COPIES_HARDER not only when two entries
  names are the same, and their hashes are equal, but also for a case,
  when a path was removed from some of all parents having it.

  The reason is, if we don't, that path won't be emitted at all (see
  "a > xi" case), and we'll just skip it, and FIND_COPIES_HARDER wants
  all paths - with diff or without - to be emitted, to be later analyzed
  for being copies sources.

  The new check is only necessary for nparent >1, as for nparent=1 case
  xmin_eqtotal always =1 =nparent, and a path is always added to diff as
  removal.

~~~~~~~~

Timings for

    # without -c, i.e. testing only nparent=1 case
    `git log --raw --no-abbrev --no-renames`

before and after the patch are as follows:

                navy.git        linux.git v3.10..v3.11

    before      0.611s          1.889s
    after       0.619s          1.907s
    slowdown    1.3%            0.9%

This timings show we did no harm to usual diff(tree1,tree2) generation.
From the table we can see that we actually did ~1% slowdown, but I think
I've "earned" that 1% in the previous patch ("tree-diff: reuse base
str(buf) memory on sub-tree recursion", HEAD~~) so for nparent=1 case,
net timings stays approximately the same.

The output also stayed the same.

(*) If we revert 1)-4) to more usual techniques, for nparent=1 case,
    we'll get ~2-2.5% of additional slowdown, which I've tried to avoid, as
   "do no harm for nparent=1 case" rule.

For linux.git, combined diff will run an order of magnitude faster and
appropriate timings will be provided in the next commit, as we'll be
taking advantage of the new diff tree-walker for combined-diff
generation there.

P.S. and combined diff is not some exotic/for-play-only stuff - for
example for a program I write to represent Git archives as readonly
filesystem, there is initial scan with

    `git log --reverse --raw --no-abbrev --no-renames -c`

to extract log of what was created/changed when, as a result building a
map

    {}  sha1    ->  in which commit (and date) a content was added

that `-c` means also show combined diff for merges, and without them, if
a merge is non-trivial (merges changes from two parents with both having
separate changes to a file), or an evil one, the map will not be full,
i.e. some valid sha1 would be absent from it.

That case was my initial motivation for combined diffs speedup.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-07 14:40:46 -07:00
Justin Lebar 01689909eb comments: fix misuses of "nor"
Signed-off-by: Justin Lebar <jlebar@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-31 15:29:27 -07:00
Junio C Hamano a5aca6e883 Merge branch 'tr/diff-submodule-no-reuse-worktree' into maint
"git diff --external-diff" incorrectly fed the submodule directory
in the working tree to the external diff driver when it knew it is
the same as one of the versions being compared.

* tr/diff-submodule-no-reuse-worktree:
  diff: do not reuse_worktree_file for submodules
2014-03-18 14:03:41 -07:00
Junio C Hamano 34120a5fb5 Merge branch 'nd/diff-quiet-stat-dirty' into maint
"git diff --quiet -- pathspec1 pathspec2" sometimes did not return
correct status value.

* nd/diff-quiet-stat-dirty:
  diff: do not quit early on stat-dirty files
  diff.c: move diffcore_skip_stat_unmatch core logic out for reuse later
2014-03-18 13:59:56 -07:00