This variable is used as a bit field[1], and as we are about to add more
fields, indicate its usage as a bit field by making it unsigned.
[1] containing the bits
#define DIFF_PICKAXE_ALL 1
#define DIFF_PICKAXE_REGEX 2
#define DIFF_PICKAXE_KIND_S 4
#define DIFF_PICKAXE_KIND_G 8
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach diff a new algorithm, one that attempts to prevent user-specified
lines from appearing as a deletion or addition in the end result. The
end user can use this by specifying "--anchored=<text>" one or more
times when using Git commands like "diff" and "show".
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A single-word "unsigned flags" in the diff options is being split
into a structure with many bitfields.
* bw/diff-opt-impl-to-bitfields:
diff: make struct diff_flags members lowercase
diff: remove DIFF_OPT_CLR macro
diff: remove DIFF_OPT_SET macro
diff: remove DIFF_OPT_TST macro
diff: remove touched flags
diff: add flag to indicate textconv was set via cmdline
diff: convert flags to be stored in bitfields
add, reset: use DIFF_OPT_SET macro to set a diff flag
Pathspec-limited revision traversal was taught not to keep finding
unneeded differences once it knows two trees are different inside
given pathspec.
* jk/revision-pruning-optim:
revision: quit pruning diff more quickly when possible
Remove the `DIFF_OPT_CLR` macro and instead set the flags directly.
This conversion is done using the following semantic patch:
@@
expression E;
identifier fld;
@@
- DIFF_OPT_CLR(&E, fld)
+ E.flags.fld = 0
@@
type T;
T *ptr;
identifier fld;
@@
- DIFF_OPT_CLR(ptr, fld)
+ ptr->flags.fld = 0
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the `DIFF_OPT_SET` macro and instead set the flags directly.
This conversion is done using the following semantic patch:
@@
expression E;
identifier fld;
@@
- DIFF_OPT_SET(&E, fld)
+ E.flags.fld = 1
@@
type T;
T *ptr;
identifier fld;
@@
- DIFF_OPT_SET(ptr, fld)
+ ptr->flags.fld = 1
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the `DIFF_OPT_TST` macro and instead access the flags directly.
This conversion is done using the following semantic patch:
@@
expression E;
identifier fld;
@@
- DIFF_OPT_TST(&E, fld)
+ E.flags.fld
@@
type T;
T *ptr;
identifier fld;
@@
- DIFF_OPT_TST(ptr, fld)
+ ptr->flags.fld
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the set of parallel touched flags are no longer being used,
remove them.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-show is unique in that it wants to use textconv by default except
for when it is showing blobs. When asked to show a blob, show doesn't
want to use textconv unless the user explicitly requested that it be
used by providing the command line flag '--textconv'.
Currently this is done by using a parallel set of 'touched' flags which
get set every time a particular flag is set or cleared. In a future
patch we want to eliminate this parallel set of flags so instead of
relying on if the textconv flag has been touched, add a new flag
'TEXTCONV_SET_VIA_CMDLINE' which is only set if textconv is set to true
via the command line.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We cannot add many more flags to the diff machinery due to the
limitations of the number of flags that can be stored in a single
unsigned int. In order to allow for more flags to be added to the diff
machinery in the future this patch converts the flags to be stored in
bitfields in 'struct diff_flags'.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the revision traversal machinery is given a pathspec,
we must compute the parent-diff for each commit to determine
which ones are TREESAME. We set the QUICK diff flag to avoid
looking at more entries than we need; we really just care
whether there are any changes at all.
But there is one case where we want to know a bit more: if
--remove-empty is set, we care about finding cases where the
change consists only of added entries (in which case we may
prune the parent in try_to_simplify_commit()). To cover that
case, our file_add_remove() callback does not quit the diff
upon seeing an added entry; it keeps looking for other types
of entries.
But this means when --remove-empty is not set (and it is not
by default), we compute more of the diff than is necessary.
You can see this in a pathological case where a commit adds
a very large number of entries, and we limit based on a
broad pathspec. E.g.:
perl -e '
chomp(my $blob = `git hash-object -w --stdin </dev/null`);
for my $a (1..1000) {
for my $b (1..1000) {
print "100644 $blob\t$a/$b\n";
}
}
' | git update-index --index-info
git commit -qm add
git rev-list HEAD -- .
This case takes about 100ms now, but after this patch only
needs 6ms. That's not a huge improvement, but it's easy to
get and it protects us against even more pathological cases
(e.g., going from 1 million to 10 million files would take
ten times as long with the current code, but not increase at
all after this patch).
This is reported to minorly speed-up pathspec limiting in
real world repositories (like the 100-million-file Windows
repository), but probably won't make a noticeable difference
outside of pathological setups.
This patch actually covers the case without --remove-empty,
and the case where we see only deletions. See the in-code
comment for details.
Note that we have to add a new member to the diff_options
struct so that our callback can see the value of
revs->remove_empty_trees. This callback parameter could be
passed to the "add_remove" and "change" callbacks, but
there's not much point. They already receive the
diff_options struct, and doing it this way avoids having to
update the function signature of the other callbacks
(arguably the format_callback and output_prefix functions
could benefit from the same simplification).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The existing behavior of diff --color-moved=zebra does not define the
minimum size of a block at all, instead relying on a heuristic applied
later to filter out sets of adjacent moved lines that are shorter than 3
lines long. This can be confusing, because a block could thus be colored
as moved at the source but not at the destination (or vice versa),
depending on its neighbors.
Instead, teach diff that the minimum size of a block is 20 alphanumeric
characters, the same heuristic used by "git blame". This allows diff to
still exclude uninteresting lines appearing on their own (such as those
solely consisting of one or a few closing braces), as was the intention
of the adjacent-moved-line heuristic.
This requires a change in some tests in that some of their lines are no
longer considered to be part of a block, because they are too short.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Any lines inside a moved block of code are not interesting. Boundaries
of blocks are only interesting if they are next to another block of moved
code.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the 'plain' mode for move detection of code. This omits the checking
for adjacent blocks, so it is not as useful. If you have a lot of the
same blocks moved in the same patch, the 'Zebra' would end up slow as it
is O(n^2) (n is number of same blocks). So this may be useful there and
is generally easy to add. Instead be very literal at the move detection,
do not skip over short blocks here.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a patch consists mostly of moving blocks of code around, it can
be quite tedious to ensure that the blocks are moved verbatim, and not
undesirably modified in the move. To that end, color blocks that are
moved within the same patch differently. For example (OM, del, add,
and NM are different colors):
[OM] -void sensitive_stuff(void)
[OM] -{
[OM] - if (!is_authorized_user())
[OM] - die("unauthorized");
[OM] - sensitive_stuff(spanning,
[OM] - multiple,
[OM] - lines);
[OM] -}
void another_function()
{
[del] - printf("foo");
[add] + printf("bar");
}
[NM] +void sensitive_stuff(void)
[NM] +{
[NM] + if (!is_authorized_user())
[NM] + die("unauthorized");
[NM] + sensitive_stuff(spanning,
[NM] + multiple,
[NM] + lines);
[NM] +}
However adjacent blocks may be problematic. For example, in this
potentially malicious patch, the swapping of blocks can be spotted:
[OM] -void sensitive_stuff(void)
[OM] -{
[OMA] - if (!is_authorized_user())
[OMA] - die("unauthorized");
[OM] - sensitive_stuff(spanning,
[OM] - multiple,
[OM] - lines);
[OMA] -}
void another_function()
{
[del] - printf("foo");
[add] + printf("bar");
}
[NM] +void sensitive_stuff(void)
[NM] +{
[NMA] + sensitive_stuff(spanning,
[NMA] + multiple,
[NMA] + lines);
[NM] + if (!is_authorized_user())
[NM] + die("unauthorized");
[NMA] +}
If the moved code is larger, it is easier to hide some permutation in the
code, which is why some alternative coloring is needed.
This patch implements the first mode:
* basic alternating 'Zebra' mode
This conveys all information needed to the user. Defer customization to
later patches.
First I implemented an alternative design, which would try to fingerprint
a line by its neighbors to detect if we are in a block or at the boundary.
This idea iss error prone as it inspected each line and its neighboring
lines to determine if the line was (a) moved and (b) if was deep inside
a hunk by having matching neighboring lines. This is unreliable as the
we can construct hunks which have equal neighbors that just exceed the
number of lines inspected. (Think of 'AXYZBXYZCXYZD..' with each letter
as a line, that is permutated to AXYZCXYZBXYZD..').
Instead this provides a dynamic programming greedy algorithm that finds
the largest moved hunk and then has several modes on highlighting bounds.
A note on the options '--submodule=diff' and '--color-words/--word-diff':
In the conversion to use emit_line in the prior patches both submodules
as well as word diff output carefully chose to call emit_line with sign=0.
All output with sign=0 is ignored for move detection purposes in this
patch, such that no weird looking output will be generated for these
cases. This leads to another thought: We could pass on '--color-moved' to
submodules such that they color up moved lines for themselves. If we'd do
so only line moves within a repository boundary are marked up.
It is useful to have moved lines colored, but there are annoying corner
cases, such as a single line moved, that is very common. For example
in a typical patch of C code, we have closing braces that end statement
blocks or functions.
While it is technically true that these lines are moved as they show up
elsewhere, it is harmful for the review as the reviewers attention is
drawn to such a minor side annoyance.
For now let's have a simple solution of hardcoding the number of
moved lines to be at least 3 before coloring them. Note, that the
length is applied across all blocks to find the 'lonely' blocks
that pollute new code, but do not interfere with a permutated
block where each permutation has less lines than 3.
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new option 'emitted_symbols' in the struct diff_options which
controls whether all output is buffered up until all output is available.
It is set internally in diff.c when necessary.
We'll have a new struct 'emitted_string' in diff.c which will be used to
buffer each line. The emitted_string will duplicate the memory of the
line to buffer as that is easiest to reason about for now. In a future
patch we may want to decrease the memory usage by not duplicating all
output for buffering but rather we may want to store offsets into the
file or in case of hunk descriptions such as the similarity score, we
could just store the relevant number and reproduce the text later on.
This approach was chosen as a first step because it is quite simple
compared to the alternative with less memory footprint.
emit_diff_symbol factors out the emission part and depending on the
diff_options->emitted_symbols the emission will be performed directly
when calling emit_diff_symbol or after the whole process is done, i.e.
by buffering we have add the possibility for a second pass over the
whole output before doing the actual output.
In 6440d34 (2012-03-14, diff: tweak a _copy_ of diff_options with
word-diff) we introduced a duplicate diff options struct for word
emissions as we may have different regex settings in there.
When buffering the output, we need to operate on just one buffer,
so we have to copy back the emissions of the word buffer into the
main buffer.
Unconditionally enable output via buffer in this patch as it yields
a great opportunity for testing, i.e. all the diff tests from the
test suite pass without having reordering issues (i.e. only parts
of the output got buffered, and we forgot to buffer other parts).
The test suite passes, which gives confidence that we converted all
functions to use emit_string for output.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We call print_stat_summary from builtin/apply, so we still
need the version with a file pointer, so introduce
print_stat_summary_0 that uses emit_string machinery and
keep print_stat_summary with the same arguments around.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As the submodule process is no longer attached to the same file pointer
'o->file' as the superprojects process, there is a different result in
color.c::check_auto_color. That is why we need to pass coloring explicitly,
such that the submodule coloring decision will be made by the child process
processing the submodule. Only DIFF_SYMBOL_SUBMODULE_PIPETHROUGH contains
color, the other symbols are for embedding the submodule output into the
superprojects output.
Remove the colors from the function signatures, as all the coloring
decisions will be made either inside the child process or the final
emit_diff_symbol, but not in the functions driving the submodule diff.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new flags field to emit_diff_symbol, that will be used by
context lines for:
* white space rules that are applicable (The first 12 bits)
Take a note in cahe.c as well, when this ws rules are extended we have
to fix the bits in the flags field.
* how the rules are evaluated (actually this double encodes the sign
of the line, but the code is easier to keep this way, bits 13,14,15)
* if the line a blank line at EOF (bit 16)
The check if new lines need to be marked up as extra lines at the end of
file, is now done unconditionally. That should be ok, as
'new_blank_line_at_eof' has a quick early return.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The internal logic used in "git blame" has been libified to make it
easier to use by cgit.
* js/blame-lib: (29 commits)
blame: move entry prepend to libgit
blame: move scoreboard setup to libgit
blame: move scoreboard-related methods to libgit
blame: move fake-commit-related methods to libgit
blame: move origin-related methods to libgit
blame: move core structures to header
blame: create entry prepend function
blame: create scoreboard setup function
blame: create scoreboard init function
blame: rework methods that determine 'final' commit
blame: wrap blame_sort and compare_blame_final
blame: move progress updates to a scoreboard callback
blame: make sanity_check use a callback in scoreboard
blame: move no_whole_file_rename flag to scoreboard
blame: move xdl_opts flags to scoreboard
blame: move show_root flag to scoreboard
blame: move reverse flag to scoreboard
blame: move contents_from to scoreboard
blame: move copy/move thresholds to scoreboard
blame: move stat counters to scoreboard
...
Convert diff_change to take a struct object_id. In addition convert the
function pointer type 'change_fn_t' to also take a struct object_id.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert diff_addremove to take a struct object_id. In addtion convert
the function pointer type 'add_remove_fn_t' to also take a struct
object_id.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
textconv_object is used in places other than blame.c and should be moved
to a more appropriate location. Other textconv related functions are
located in diff.c so that seems as good a place as any.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is needed to convert parse_tree_indirect.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since this structure handles an array of object IDs, rename it to struct
oid_array. Also rename the accessor functions and the initialization
constant.
This commit was produced mechanically by providing non-Documentation
files to the following Perl one-liners:
perl -pi -E 's/struct sha1_array/struct oid_array/g'
perl -pi -E 's/\bsha1_array_/oid_array_/g'
perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g'
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When new paths were added by "git add -N" to the index, it was
enough to circumvent the check by "git commit" to refrain from
making an empty commit without "--allow-empty". The same logic
prevented "git status" to show such a path as "new file" in the
"Changes not staged for commit" section.
* nd/ita-empty-commit:
commit: don't be fooled by ita entries when creating initial commit
commit: fix empty commit creation when there's no changes but ita entries
diff: add --ita-[in]visible-in-index
diff-lib: allow ita entries treated as "not yet exist in index"
Since we're modifying this function anyway, it's a good time
to update it to the more modern "struct oid". We can also
drop some of the magic numbers in favor of GIT_SHA1_HEXSZ,
along with some descriptive comments.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The word "align" describes how the function actually differs
from find_unique_abbrev, and will make it less confusing
when we add more diff-specific abbrevation functions that do
not do this alignment.
Since this is a globally available function, let's also move
its descriptive comment to the header file, where we
typically document function interfaces.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If i-t-a entries are present and there is no change between the index
and HEAD i-t-a entries, index_differs_from() still returns "dirty, new
entries" (aka, the resulting commit is not empty), but cache-tree will
skip i-t-a entries and produce the exact same tree of current
commit.
index_differs_from() is supposed to catch this so we can abort
git-commit (unless --no-empty is specified). Update it to optionally
ignore i-t-a entries when doing a diff between the index and HEAD so
that it would return "no change" in this case and abort commit.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When comparing the index and the working tree to show which paths are
new, and comparing the tree recorded in the HEAD and the index to see if
committing the contents recorded in the index would result in an empty
commit, we would want the former comparison to say "these are new paths"
and the latter to say "there is no change" for paths that are marked as
intent-to-add.
We made a similar attempt at d95d728a ("diff-lib.c: adjust position of
i-t-a entries in diff", 2015-03-16), which redefined the semantics of
these two comparison modes globally, which was a disaster and had to be
reverted at 78cc1a54 ("Revert "diff-lib.c: adjust position of i-t-a
entries in diff"", 2015-06-23).
To make sure we do not repeat the same mistake, introduce a new internal
diffopt option so that this different semantics can be asked for only by
callers that ask it, while making sure other unaudited callers will get
the same comparison result.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Output from "git diff" can be made easier to read by selecting
which lines are common and which lines are added/deleted
intelligently when the lines before and after the changed section
are the same. A command line option is added to help with the
experiment to find a good heuristics.
* mh/diff-indent-heuristic:
blame: honor the diff heuristic options and config
parse-options: add parse_opt_unknown_cb()
diff: improve positioning of add/delete blocks in diffs
xdl_change_compact(): introduce the concept of a change group
recs_match(): take two xrecord_t pointers as arguments
is_blank_line(): take a single xrecord_t as argument
xdl_change_compact(): only use heuristic if group can't be matched
xdl_change_compact(): fix compaction heuristic to adjust ixo
Teach "git blame" and "git annotate" the --compaction-heuristic and
--indent-heuristic options that are now supported by "git diff".
Also teach them to honor the `diff.compactionHeuristic` and
`diff.indentHeuristic` configuration options.
It would be conceivable to introduce separate configuration options for
"blame" and "annotate"; for example `blame.compactionHeuristic` and
`blame.indentHeuristic`. But it would be confusing to users if blame
output is inconsistent with diff output, so it makes more sense for them
to respect the same configuration.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git diff --submodule={short,log}" mechanism has been enhanced
to allow "--submodule=diff" to show the patch between the submodule
commits bound to the superproject.
* jk/diff-submodule-diff-inline:
diff: teach diff to display submodule difference with an inline diff
submodule: refactor show_submodule_summary with helper function
submodule: convert show_submodule_summary to use struct object_id *
allow do_submodule_path to work even if submodule isn't checked out
diff: prepare for additional submodule formats
graph: add support for --line-prefix on all graph-aware output
diff.c: remove output_prefix_length field
cache: add empty_tree_oid object and helper function
Teach git-diff and friends a new format for displaying the difference
of a submodule. The new format is an inline diff of the contents of the
submodule between the commit range of the update. This allows the user
to see the actual code change caused by a submodule update.
Add tests for the new format and option.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A future patch will add a new format for displaying the difference of
a submodule. Make it easier by changing how we store the current
selected format. Replace the DIFF_OPT flag with an enumeration, as each
format will be mutually exclusive.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add an extension to git-diff and git-log (and any other graph-aware
displayable output) such that "--line-prefix=<string>" will print the
additional line-prefix on every line of output.
To make this work, we have to fix a few bugs in the graph API that force
graph_show_commit_msg to be used only when you have a valid graph.
Additionally, we extend the default_diff_output_prefix handler to work
even when no graph is enabled.
This is somewhat of a hack on top of the graph API, but I think it
should be acceptable here.
This will be used by a future extension of submodule display which
displays the submodule diff as the actual diff between the pre and post
commit in the submodule project.
Add some tests for both git-log and git-diff to ensure that the prefix
is honored correctly.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"diff/log --stat" has a logic that determines the display columns
available for the diffstat part of the output and apportions it for
pathnames and diffstat graph automatically.
5e71a84a (Add output_prefix_length to diff_options, 2012-04-16)
added the output_prefix_length field to diff_options structure to
allow this logic to subtract the display columns used for the
history graph part from the total "terminal width"; this matters
when the "git log --graph -p" option is in use.
The field must be set to the number of display columns needed to
show the output from the output_prefix() callback, which is error
prone. As there is only one user of the field, and the user has the
actual value of the prefix string, let's get rid of the field and
have the user count the display width itself.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will allow a diff patch id to be created using only the header data
so that the contents of the file will not have to be loaded.
Signed-off-by: Kevin Willford <kcwillford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The end-user facing Porcelain level commands like "diff" and "log"
now enables the rename detection by default.
* mm/diff-renames-default:
diff: activate diff.renames by default
log: introduce init_log_defaults()
t: add tests for diff.renames (true/false/unset)
t4001-diff-rename: wrap file creations in a test
Documentation/diff-config: fix description of diff.renames
Update various codepaths to avoid manually-counted malloc().
* jk/tighten-alloc: (22 commits)
ewah: convert to REALLOC_ARRAY, etc
convert ewah/bitmap code to use xmalloc
diff_populate_gitlink: use a strbuf
transport_anonymize_url: use xstrfmt
git-compat-util: drop mempcpy compat code
sequencer: simplify memory allocation of get_message
test-path-utils: fix normalize_path_copy output buffer size
fetch-pack: simplify add_sought_entry
fast-import: simplify allocation in start_packfile
write_untracked_extension: use FLEX_ALLOC helper
prepare_{git,shell}_cmd: use argv_array
use st_add and st_mult for allocation size computation
convert trivial cases to FLEX_ARRAY macros
use xmallocz to avoid size arithmetic
convert trivial cases to ALLOC_ARRAY
convert manual allocations to argv_array
argv-array: add detach function
add helpers for allocating flex-array structs
harden REALLOC_ARRAY and xcalloc against size_t overflow
tree-diff: catch integer overflow in combine_diff_path allocation
...
The memory ownership rule of fill_textconv() API, which was a bit
tricky, has been documented a bit better.
* jk/more-comments-on-textconv:
diff: clarify textconv interface
Rename detection is a very convenient feature, and new users shouldn't
have to dig in the documentation to benefit from it.
Potential objections to activating rename detection are that it
sometimes fail, and it is sometimes slow. But rename detection is
already activated by default in several cases like "git status" and "git
merge", so activating diff.renames does not fundamentally change the
situation. When the rename detection fails, it now fails consistently
between "git diff" and "git status".
This setting does not affect plumbing commands, hence well-written
scripts will not be affected.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The memory allocation scheme for the textconv interface is a
bit tricky, and not well documented. It was originally
designed as an internal part of diff.c (matching
fill_mmfile), but gradually was made public.
Refactoring it is difficult, but we can at least improve the
situation by documenting the intended flow and enforcing it
with an in-code assertion.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A combine_diff_path struct has two "flex" members allocated
alongside the struct: a string to hold the pathname, and an
array of parent pointers. We use an "int" to compute this,
meaning we may easily overflow it if the pathname is
extremely long.
We can fix this by using size_t, and checking for overflow
with the st_add helper.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A few options of "git diff" did not work well when the command was
run from a subdirectory.
* nd/diff-with-path-params:
diff: make -O and --output work in subdirectory
diff-no-index: do not take a redundant prefix argument
A few options of "git diff" did not work well when the command was
run from a subdirectory.
* nd/diff-with-path-params:
diff: make -O and --output work in subdirectory
diff-no-index: do not take a redundant prefix argument
Prefix is already set up in "revs". The same prefix should be used for
all options parsing. So kill the last argument. This patch does not
actually change anything because the only caller does use the same
prefix for init_revisions() and diff_no_index().
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We sometimes use 32-bit unsigned integers as bit-fields.
It's fine to access the MSB, because it's unsigned. However,
doing so as "1 << 31" is wrong, because the constant "1" is
a signed int, and we shift into the sign bit, causing
undefined behavior.
We can fix this by using "1U" as the constant.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
People who work on projects with mostly linear history with frequent
whole file renames may want to always use "git log --follow" when
inspecting the life of the content that live in a single path.
Teach the command to behave as if "--follow" was given from the
command line when log.follow configuration variable is set *and*
there is one (and only one) path on the command line.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"color.diff.plain" was a misnomer; give it 'color.diff.context' as
a more logical synonym.
* jk/color-diff-plain-is-context:
diff.h: rename DIFF_PLAIN color slot to DIFF_CONTEXT
diff: accept color.diff.context as a synonym for "plain"
Allow whitespace breakages in deleted and context lines to be also
painted in the output.
* jc/diff-ws-error-highlight:
diff.c: --ws-error-highlight=<kind> option
diff.c: add emit_del_line() and emit_context_line()
t4015: separate common setup and per-test expectation
t4015: modernise style
The latter is a much more descriptive name (and we support
"color.diff.context" now). This also updates the name of any
local variables which were used to store the color.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Traditionally, we only cared about whitespace breakages introduced
in new lines. Some people want to paint whitespace breakages on old
lines, too. When they see a whitespace breakage on a new line, they
can spot the same kind of whitespace breakage on the corresponding
old line and want to say "Ah, those breakages are there but they
were inherited from the original, so let's not touch them for now."
Introduce `--ws-error-highlight=<kind>` option, that lets them pass
a comma separated list of `old`, `new`, and `context` to specify
what lines to highlight whitespace errors on.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also, convert a constant to GIT_SHA1_HEXSZ.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of running N pair-wise diff-trees when inspecting a
N-parent merge, find the set of paths that were touched by walking
N+1 trees in parallel. These set of paths can then be turned into
N pair-wise diff-tree results to be processed through rename
detections and such. And N=2 case nicely degenerates to the usual
2-way diff-tree, which is very nice.
* ks/tree-diff-nway:
mingw: activate alloca
combine-diff: speed it up, by using multiparent diff tree-walker directly
tree-diff: rework diff_tree() to generate diffs for multiparent cases as well
Portable alloca for Git
tree-diff: reuse base str(buf) memory on sub-tree recursion
tree-diff: no need to call "full" diff_tree_sha1 from show_path()
tree-diff: rework diff_tree interface to be sha1 based
tree-diff: diff_tree() should now be static
tree-diff: remove special-case diff-emitting code for empty-tree cases
tree-diff: simplify tree_entry_pathcmp
tree-diff: show_path prototype is not needed anymore
tree-diff: rename compare_tree_entry -> tree_entry_pathcmp
tree-diff: move all action-taking code out of compare_tree_entry()
tree-diff: don't assume compare_tree_entry() returns -1,0,1
tree-diff: consolidate code for emitting diffs and recursion in one place
tree-diff: show_tree() is not needed
tree-diff: no need to pass match to skip_uninteresting()
tree-diff: no need to manually verify that there is no mode change for a path
combine-diff: move changed-paths scanning logic into its own function
combine-diff: move show_log_first logic/action out of paths scanning
Previously diff_tree(), which is now named ll_diff_tree_sha1(), was
generating diff_filepair(s) for two trees t1 and t2, and that was
usually used for a commit as t1=HEAD~, and t2=HEAD - i.e. to see changes
a commit introduces.
In Git, however, we have fundamentally built flexibility in that a
commit can have many parents - 1 for a plain commit, 2 for a simple merge,
but also more than 2 for merging several heads at once.
For merges there is a so called combine-diff, which shows diff, a merge
introduces by itself, omitting changes done by any parent. That works
through first finding paths, that are different to all parents, and then
showing generalized diff, with separate columns for +/- for each parent.
The code lives in combine-diff.c .
There is an impedance mismatch, however, in that a commit could
generally have any number of parents, and that while diffing trees, we
divide cases for 2-tree diffs and more-than-2-tree diffs. I mean there
is no special casing for multiple parents commits in e.g.
revision-walker .
That impedance mismatch *hurts* *performance* *badly* for generating
combined diffs - in "combine-diff: optimize combine_diff_path
sets intersection" I've already removed some slowness from it, but from
the timings provided there, it could be seen, that combined diffs still
cost more than an order of magnitude more cpu time, compared to diff for
usual commits, and that would only be an optimistic estimate, if we take
into account that for e.g. linux.git there is only one merge for several
dozens of plain commits.
That slowness comes from the fact that currently, while generating
combined diff, a lot of time is spent computing diff(commit,commit^2)
just to only then intersect that huge diff to almost small set of files
from diff(commit,commit^1).
That's because at present, to compute combine-diff, for first finding
paths, that "every parent touches", we use the following combine-diff
property/definition:
D(A,P1...Pn) = D(A,P1) ^ ... ^ D(A,Pn) (w.r.t. paths)
where
D(A,P1...Pn) is combined diff between commit A, and parents Pi
and
D(A,Pi) is usual two-tree diff Pi..A
So if any of that D(A,Pi) is huge, tracting 1 n-parent combine-diff as n
1-parent diffs and intersecting results will be slow.
And usually, for linux.git and other topic-based workflows, that
D(A,P2) is huge, because, if merge-base of A and P2, is several dozens
of merges (from A, via first parent) below, that D(A,P2) will be diffing
sum of merges from several subsystems to 1 subsystem.
The solution is to avoid computing n 1-parent diffs, and to find
changed-to-all-parents paths via scanning A's and all Pi's trees
simultaneously, at each step comparing their entries, and based on that
comparison, populate paths result, and deduce we could *skip*
*recursing* into subdirectories, if at least for 1 parent, sha1 of that
dir tree is the same as in A. That would save us from doing significant
amount of needless work.
Such approach is very similar to what diff_tree() does, only there we
deal with scanning only 2 trees simultaneously, and for n+1 tree, the
logic is a bit more complex:
D(T,P1...Pn) calculation scheme
-------------------------------
D(T,P1...Pn) = D(T,P1) ^ ... ^ D(T,Pn) (regarding resulting paths set)
D(T,Pj) - diff between T..Pj
D(T,P1...Pn) - combined diff from T to parents P1,...,Pn
We start from all trees, which are sorted, and compare their entries in
lock-step:
T P1 Pn
- - -
|t| |p1| |pn|
|-| |--| ... |--| imin = argmin(p1...pn)
| | | | | |
|-| |--| |--|
|.| |. | |. |
. . .
. . .
at any time there could be 3 cases:
1) t < p[imin];
2) t > p[imin];
3) t = p[imin].
Schematic deduction of what every case means, and what to do, follows:
1) t < p[imin] -> ∀j t ∉ Pj -> "+t" ∈ D(T,Pj) -> D += "+t"; t↓
2) t > p[imin]
2.1) ∃j: pj > p[imin] -> "-p[imin]" ∉ D(T,Pj) -> D += ø; ∀ pi=p[imin] pi↓
2.2) ∀i pi = p[imin] -> pi ∉ T -> "-pi" ∈ D(T,Pi) -> D += "-p[imin]"; ∀i pi↓
3) t = p[imin]
3.1) ∃j: pj > p[imin] -> "+t" ∈ D(T,Pj) -> only pi=p[imin] remains to investigate
3.2) pi = p[imin] -> investigate δ(t,pi)
|
|
v
3.1+3.2) looking at δ(t,pi) ∀i: pi=p[imin] - if all != ø ->
⎧δ(t,pi) - if pi=p[imin]
-> D += ⎨
⎩"+t" - if pi>p[imin]
in any case t↓ ∀ pi=p[imin] pi↓
~
For comparison, here is how diff_tree() works:
D(A,B) calculation scheme
-------------------------
A B
- -
|a| |b| a < b -> a ∉ B -> D(A,B) += +a a↓
|-| |-| a > b -> b ∉ A -> D(A,B) += -b b↓
| | | | a = b -> investigate δ(a,b) a↓ b↓
|-| |-|
|.| |.|
. .
. .
~~~~~~~~
This patch generalizes diff tree-walker to work with arbitrary number of
parents as described above - i.e. now there is a resulting tree t, and
some parents trees tp[i] i=[0..nparent). The generalization builds on
the fact that usual diff
D(A,B)
is by definition the same as combined diff
D(A,[B]),
so if we could rework the code for common case and make it be not slower
for nparent=1 case, usual diff(t1,t2) generation will not be slower, and
multiparent diff tree-walker would greatly benefit generating
combine-diff.
What we do is as follows:
1) diff tree-walker ll_diff_tree_sha1() is internally reworked to be
a paths generator (new name diff_tree_paths()), with each generated path
being `struct combine_diff_path` with info for path, new sha1,mode and for
every parent which sha1,mode it was in it.
2) From that info, we can still generate usual diff queue with
struct diff_filepairs, via "exporting" generated
combine_diff_path, if we know we run for nparent=1 case.
(see emit_diff() which is now named emit_diff_first_parent_only())
3) In order for diff_can_quit_early(), which checks
DIFF_OPT_TST(opt, HAS_CHANGES))
to work, that exporting have to be happening not in bulk, but
incrementally, one diff path at a time.
For such consumers, there is a new callback in diff_options
introduced:
->pathchange(opt, struct combine_diff_path *)
which, if set to !NULL, is called for every generated path.
(see new compat ll_diff_tree_sha1() wrapper around new paths
generator for setup)
4) The paths generation itself, is reworked from previous
ll_diff_tree_sha1() code according to "D(A,P1...Pn) calculation
scheme" provided above:
On the start we allocate [nparent] arrays in place what was
earlier just for one parent tree.
then we just generalize loops, and comparison according to the
algorithm.
Some notes(*):
1) alloca(), for small arrays, is used for "runs not slower for
nparent=1 case than before" goal - if we change it to xmalloc()/free()
the timings get ~1% worse. For alloca() we use just-introduced
xalloca/xalloca_free compatibility wrappers, so it should not be a
portability problem.
2) For every parent tree, we need to keep a tag, whether entry from that
parent equals to entry from minimal parent. For performance reasons I'm
keeping that tag in entry's mode field in unused bit - see S_IFXMIN_NEQ.
Not doing so, we'd need to alloca another [nparent] array, which hurts
performance.
3) For emitted paths, memory could be reused, if we know the path was
processed via callback and will not be needed later. We use efficient
hand-made realloc-style path_appendnew(), that saves us from ~1-1.5%
of potential additional slowdown.
4) goto(s) are used in several places, as the code executes a little bit
faster with lowered register pressure.
Also
- we should now check for FIND_COPIES_HARDER not only when two entries
names are the same, and their hashes are equal, but also for a case,
when a path was removed from some of all parents having it.
The reason is, if we don't, that path won't be emitted at all (see
"a > xi" case), and we'll just skip it, and FIND_COPIES_HARDER wants
all paths - with diff or without - to be emitted, to be later analyzed
for being copies sources.
The new check is only necessary for nparent >1, as for nparent=1 case
xmin_eqtotal always =1 =nparent, and a path is always added to diff as
removal.
~~~~~~~~
Timings for
# without -c, i.e. testing only nparent=1 case
`git log --raw --no-abbrev --no-renames`
before and after the patch are as follows:
navy.git linux.git v3.10..v3.11
before 0.611s 1.889s
after 0.619s 1.907s
slowdown 1.3% 0.9%
This timings show we did no harm to usual diff(tree1,tree2) generation.
From the table we can see that we actually did ~1% slowdown, but I think
I've "earned" that 1% in the previous patch ("tree-diff: reuse base
str(buf) memory on sub-tree recursion", HEAD~~) so for nparent=1 case,
net timings stays approximately the same.
The output also stayed the same.
(*) If we revert 1)-4) to more usual techniques, for nparent=1 case,
we'll get ~2-2.5% of additional slowdown, which I've tried to avoid, as
"do no harm for nparent=1 case" rule.
For linux.git, combined diff will run an order of magnitude faster and
appropriate timings will be provided in the next commit, as we'll be
taking advantage of the new diff tree-walker for combined-diff
generation there.
P.S. and combined diff is not some exotic/for-play-only stuff - for
example for a program I write to represent Git archives as readonly
filesystem, there is initial scan with
`git log --reverse --raw --no-abbrev --no-renames -c`
to extract log of what was created/changed when, as a result building a
map
{} sha1 -> in which commit (and date) a content was added
that `-c` means also show combined diff for merges, and without them, if
a merge is non-trivial (merges changes from two parents with both having
separate changes to a file), or an evil one, the map will not be full,
i.e. some valid sha1 would be absent from it.
That case was my initial motivation for combined diffs speedup.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We reworked all its users to use the functionality through
diff_tree_sha1 variant in recent patches (see "tree-diff: allow
diff_tree_sha1 to accept NULL sha1" and what comes next).
diff_tree() is now not used outside tree-diff.c - make it static.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The field was used in order to speed-up name comparison and also to
mark removed paths by setting it to 0.
Because the updated code does significantly less strcmp and also
just removes paths from the list and free right after we know a path
will not be needed, it is not needed anymore.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff ../else/where/A ../else/where/B" when ../else/where is
clearly outside the repository, and "git diff --no-index A B", do
not have to look at the index at all, but we used to read the index
unconditionally.
* tg/diff-no-index-refactor:
diff: avoid some nesting
diff: add test for --no-index executed outside repo
diff: don't read index when --no-index is given
diff: move no-index detection to builtin/diff.c
Show the total number of paths and the number of paths shown so far
when "git difftool" prompts to launch an external diff tool, which
would give users some sense of progress.
* zk/difftool-counts:
diff.c: fix some recent whitespace style violations
difftool: display the number of files in the diff queue in the prompt
Currently the --no-index option is parsed in diff_no_index(). Move the
detection if a no-index diff should be executed to builtin/diff.c, where
we can use it for executing diff_no_index() conditionally. This will
also allow us to execute other operations conditionally, which will be
done in the next patch.
There are no functional changes.
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When --prompt option is set, git-difftool displays a prompt for each
modified file to be viewed in an external diff program. At that
point, it could be useful to display a counter and the total number
of files in the diff queue.
Below is the current difftool prompt for the first of 5 modified files:
Viewing: 'diff.c'
Launch 'vimdiff' [Y/n]:
Consider the modified prompt:
Viewing (1/5): 'diff.c'
Launch 'vimdiff' [Y/n]:
The current GIT_EXTERNAL_DIFF mechanism does not tell the number of
paths in the diff queue nor the current counter. To make this
"counter/total" info available for GIT_EXTERNAL_DIFF programs
without breaking existing ones by doing the following:
- Keep track of the number of paths shown so far in diff_options;
- Export two new environment variables from run_external_diff() to
show the total number of paths (from diff_queue_struct) and the
current value of the counter (from diff_options); and
- Update git-difftool--helper to use these two environment variables.
Signed-off-by: Zoltan Klinger <zoltan.klinger@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make "git grep" and "git show" pay attention to --textconv when
dealing with blob objects.
* mg/more-textconv:
grep: honor --textconv for the case rev:path
grep: allow to use textconv filters
t7008: demonstrate behavior of grep with textconv
cat-file: do not die on --textconv without textconv filters
show: honor --textconv for blobs
diff_opt: track whether flags have been set explicitly
t4030: demonstrate behavior of show with textconv
"git mv A B" when moving a submodule A does "the right thing",
inclusing relocating its working tree and adjusting the paths in
the .gitmodules file.
* jl/submodule-mv: (53 commits)
rm: delete .gitmodules entry of submodules removed from the work tree
mv: update the path entry in .gitmodules for moved submodules
submodule.c: add .gitmodules staging helper functions
mv: move submodules using a gitfile
mv: move submodules together with their work trees
rm: do not set a variable twice without intermediate reading.
t6131 - skip tests if on case-insensitive file system
parse_pathspec: accept :(icase)path syntax
pathspec: support :(glob) syntax
pathspec: make --literal-pathspecs disable pathspec magic
pathspec: support :(literal) syntax for noglob pathspec
kill limit_pathspec_to_literal() as it's only used by parse_pathspec()
parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN
parse_pathspec: make sure the prefix part is wildcard-free
rename field "raw" to "_raw" in struct pathspec
tree-diff: remove the use of pathspec's raw[] in follow-rename codepath
remove match_pathspec() in favor of match_pathspec_depth()
remove init_pathspec() in favor of parse_pathspec()
remove diff_tree_{setup,release}_paths
convert common_prefix() to use struct pathspec
...
This was inherited from "show-diff -q" that was invented to tell
comparison between the index and the working tree to ignore only
removals in 2005.
These days, it is spelled as "--diff-filter=d".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reimplements the ancient "-q" option to "git diff-files" that
was inherited from "show-diff -q" in terms of "--diff-filter=d". We
will be deprecating the "-q" option, so let's issue a warning when
we do so.
Incidentally this also tentatively fixes "git diff --no-index" to
honor "-q" and hide deletions; the use will get the same warning.
We should remove the support for "-q" in a future version but it is
not that urgent.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of running strchr() on the list of status characters over
and over again, parse the --diff-filter option into bitfields and
use the bits to see if the change to the filepair matches the status
requested.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diff_opt infrastructure sets flags based on defaults and command
line options. It is impossible to tell whether a flag has been set
as a default or on explicit request. Update the structure so that
this detection is possible:
* Add an extra "opt->touched_flags" that keeps track of all the
fields that have been touched by DIFF_OPT_SET and DIFF_OPT_CLR.
* You may continue setting the default values to the flags, like
commands in the "log" family do in cmd_log_init_defaults(), but
after you finished setting the defaults, you clear the
touched_flags field;
* And then you let the usual callchain call diff_opt_parse(),
allowing the opt->flags be set or unset, while keeping track of
which bits the user touched;
* There is an optional callback "opt->set_default" that is called
at the very beginning to let you inspect touched_flags and update
opt->flags appropriately, before the remainder of the diffcore
machinery is set up, taking the opt->flags value into account.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add diff.algorithm configuration so that the user does not type
"diff --histogram".
* mp/diff-algo-config:
diff: Introduce --diff-algorithm command line option
config: Introduce diff.algorithm variable
git-completion.bash: Autocomplete --minimal and --histogram for git-diff
This is a helper function to call the diff output_prefix function and
return its value as a C string, allowing us to greatly simplify
everywhere that needs to get the output prefix.
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since command line options have higher priority than config file
variables and taking previous commit into account, we need a way
how to specify myers algorithm on command line. However,
inventing `--myers` is not the right answer. We need far more
general option, and that is `--diff-algorithm`.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is used by diff-no-index.c, part of libgit.a while it stays in
builtin/diff.c. Move it to diff.c so that we won't get undefined
reference if a program that uses libgit.a happens to pull it in.
While at it, move check_pager from git.c to pager.c. It makes more
sense there and pager.c is also part of libgit.a
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
We do not want a link to 0{40} object stored anywhere in our objects.
* jk/maint-null-in-trees:
fsck: detect null sha1 in tree entries
do not write null sha1s to on-disk index
diff: do not use null sha1 as a sentinel value
diff_setup_done() has historically returned an error code, but lost
the last nonzero return in 943d5b7 (allow diff.renamelimit to be set
regardless of -M/-C, 2006-08-09). The callers were in a pretty
confused state: some actually checked for the return code, and some
did not.
Let it return void, and patch all callers to take this into account.
This conveniently also gets rid of a handful of different(!) error
messages that could never be triggered anyway.
Note that the function can still die().
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diff code represents paths using the diff_filespec
struct. This struct has a sha1 to represent the sha1 of the
content at that path, as well as a sha1_valid member which
indicates whether its sha1 field is actually useful. If
sha1_valid is not true, then the filespec represents a
working tree file (e.g., for the no-index case, or for when
the index is not up-to-date).
The diff_filespec is only used internally, though. At the
interfaces to the diff subsystem, callers feed the sha1
directly, and we create a diff_filespec from it. It's at
that point that we look at the sha1 and decide whether it is
valid or not; callers may pass the null sha1 as a sentinel
value to indicate that it is not.
We should not typically see the null sha1 coming from any
other source (e.g., in the index itself, or from a tree).
However, a corrupt tree might have a null sha1, which would
cause "diff --patch" to accidentally diff the working tree
version of a file instead of treating it as a blob.
This patch extends the edges of the diff interface to accept
a "sha1_valid" flag whenever we accept a sha1, and to use
that flag when creating a filespec. In some cases, this
means passing the flag through several layers, making the
code change larger than would be desirable.
One alternative would be to simply die() upon seeing
corrupted trees with null sha1s. However, this fix more
directly addresses the problem (while bogus sha1s in a tree
are probably a bad thing, it is really the sentinel
confusion sending us down the wrong code path that is what
makes it devastating). And it means that git is more capable
of examining and debugging these corrupted trees. For
example, you can still "diff --raw" such a tree to find out
when the bogus entry was introduced; you just cannot do a
"--patch" diff (just as you could not with any other
corrupted tree, as we do not have any content to diff).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"log --graph" was not very friendly with "--stat" option and its output
had line breaks at wrong places.
By Lucian Poston (5) and Zbigniew Jędrzejewski-Szmek (2)
* lp/diffstat-with-graph:
t4052: work around shells unable to set COLUMNS to 1
Prevent graph_width of stat width from falling below min
t4052: Test diff-stat output with minimum columns
t4052: Adjust --graph --stat output for prefixes
Adjust stat width calculations to take --graph output into account
Add output_prefix_length to diff_options
t4052: test --stat output with --graph
Forbids rename detection logic from matching two empty files as renames
during merge-recursive to prevent mismerges.
By Jeff King
* jk/diff-no-rename-empty:
merge-recursive: don't detect renames of empty files
teach diffcore-rename to optionally ignore empty content
make is_empty_blob_sha1 available everywhere
drop casts from users EMPTY_TREE_SHA1_BIN
Add output_prefix_length to diff_options. Initialize the value to 0 and only
set it when graph.c:diff_output_prefix_callback() is called.
Signed-off-by: Lucian Poston <lucian.poston@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Resurrects the preparatory clean-up patches from another topic that was
discarded, as this would give a saner foundation to build on diff.algo
configuration option series.
* jc/diff-algo-cleanup:
xdiff: PATIENCE/HISTOGRAM are not independent option bits
xdiff: remove XDL_PATCH_* macros
Our rename detection is a heuristic, matching pairs of
removed and added files with similar or identical content.
It's unlikely to be wrong when there is actual content to
compare, and we already take care not to do inexact rename
detection when there is not enough content to produce good
results.
However, we always do exact rename detection, even when the
blob is tiny or empty. It's easy to get false positives with
an empty blob, simply because it is an obvious content to
use as a boilerplate (e.g., when telling git that an empty
directory is worth tracking via an empty .gitignore).
This patch lets callers specify whether or not they are
interested in using empty files as rename sources and
destinations. The default is "yes", keeping the original
behavior. It works by detecting the empty-blob sha1 for
rename sources and destinations.
One more flexible alternative would be to allow the caller
to specify a minimum size for a blob to be "interesting" for
rename detection. But that would catch small boilerplate
files, not large ones (e.g., if you had the GPL COPYING file
in many directories).
A better alternative would be to allow a "-rename"
gitattribute to allow boilerplate files to be marked as
such. I'll leave the complexity of that solution until such
time as somebody actually wants it. The complaints we've
seen so far revolve around empty files, so let's start with
the simple thing.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By Junio C Hamano (2) and Ramsay Jones (1)
* jc/pickaxe-ignore-case:
ctype.c: Fix a sparse warning
pickaxe: allow -i to search in patch case-insensitively
grep: use static trans-case table
By Zbigniew Jędrzejewski-Szmek (8) and Junio C Hamano (1)
* zj/diff-stat-dyncol:
: This breaks tests. Perhaps it is not worth using the decimal-width stuff
: for this series, at least initially.
diff --stat: add config option to limit graph width
diff --stat: enable limiting of the graph part
diff --stat: add a test for output with COLUMNS=40
diff --stat: use a maximum of 5/8 for the filename part
merge --stat: use the full terminal width
log --stat: use the full terminal width
show --stat: use the full terminal width
diff --stat: use the full terminal width
diff --stat: tests for long filenames and big change counts
A new option --stat-graph-width=<width> can be used to limit the width
of the graph part even is more space is available. Up to <width>
columns will be used for the graph.
If commits changing a lot of lines are displayed in a wide terminal
window (200 or more columns), and the +- graph uses the full width,
the output can be hard to comfortably scan with a horizontal movement
of human eyes. Messages wrapped to about 80 columns would be
interspersed with very long +- lines. It makes sense to limit the
width of the graph part to a fixed value (e.g. 70 columns), even if
more columns are available.
Signed-off-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git log -S<string>" is a useful way to find the last commit in the
codebase that touched the <string>. As it was designed to be used by a
porcelain script to dig the history starting from a block of text that
appear in the starting commit, it never had to look for anything but an
exact match.
When used by an end user who wants to look for the last commit that
removed a string (e.g. name of a variable) that he vaguely remembers,
however, it is useful to support case insensitive match.
When given the "--regexp-ignore-case" (or "-i") option, which originally
was designed to affect case sensitivity of the search done in the commit
log part, e.g. "log --grep", the matches made with -S/-G pickaxe search is
done case insensitively now.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because the default Myers, patience and histogram algorithms cannot be in
effect at the same time, XDL_PATIENCE_DIFF and XDL_HISTOGRAM_DIFF are not
independent bits. Instead of wasting one bit per algorithm, define a few
macros to access the few bits they occupy and update the code that access
them.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --stat" and "git apply --stat" now learn to print the line
"%d files changed, %d insertions(+), %d deletions(-)" in singular form
whenever applicable. "0 insertions" and "0 deletions" are also omitted
unless they are both zero.
This matches how versions of "diffstat" that are not prehistoric produced
their output, and also makes this line translatable.
[jc: with help from Thomas Dickey in archaeology of "diffstat"]
[jc: squashed Jonathan's updates to illustrations in tutorials and a test]
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of passing the hash of a commit and then searching that
same commit in the single caller, simply pass the commit directly.
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Maintaining an array of hashes is easier using sha1_array than
open-coding it. This patch also fixes a leak of the SHA1 array
in diff_tree_combined_merge().
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the option -W/--function-context to git diff. It is similar to
the same option of git grep and expands the context of change hunks
so that the whole surrounding function is shown. This "natural"
context can allow changes to be understood better.
Note: GNU patch doesn't like diffs generated with the new option;
it seems to expect context lines to be the same before and after
changes. git apply doesn't complain.
This implementation has the same shortcoming as the one in grep,
namely that there is no way to explicitly find the end of a
function. That means that a few lines of extra context are shown,
right up to the next recognized function begins. It's already
useful in its current form, though.
The function get_func_line() in xdiff/xemit.c is extended to work
forward as well as backward to find post-context as well as
pre-context. It returns the position of the first found matching
line. The func_line parameter is made optional, as we don't need
it for -W.
The enhanced function is then used in xdl_emit_diff() to extend
the context as needed. If the added context overlaps with the
next change, it is merged into the current hunk.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/color-and-pager:
want_color: automatically fallback to color.ui
diff: don't load color config in plumbing
config: refactor get_colorbool function
color: delay auto-color decision until point of use
git_config_colorbool: refactor stdout_is_tty handling
diff: refactor COLOR_DIFF from a flag into an int
setup_pager: set GIT_PAGER_IN_USE
t7006: use test_config helpers
test-lib: add helper functions for config
t7006: modernize calls to unset
Conflicts:
builtin/commit.c
parse-options.c
This lets us store more than just a bit flag for whether we
want color; we can also store whether we want automatic
colors. This can be useful for making the automatic-color
decision closer to the point of use.
This mostly just involves replacing DIFF_OPT_* calls with
manipulations of the flag. The biggest exception is that
calls to DIFF_OPT_TST must check for "o->use_color > 0",
which lets an "unknown" value (i.e., the default) stay at
"no color". In the previous code, a value of "-1" was not
propagated at all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the "do not stop feeding the backend early" logic into a small
helper function and use it in both run_diff_files() and diff_tree() that
has the stop-early optimization. We may later add other types of diffcore
transformation that require to look at the whole result like diff-filter
does, and having the logic in a single place is essential for longer term
maintainability.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Often one is interested in the full --stat output only for commits which
change a few files, but not others, because larger restructuring gives a
--stat which fills a few screens.
Introduce a new option --stat-count=<count> which limits the --stat output
to the first <count> lines, followed by a "..." line. It can
also be given as the third parameter in
--stat=<width>,<name-width>,<count>.
Also, the unstuck form is supported analogous to the other two stat
parameters.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jh/dirstat-lines:
Mark dirstat error messages for translation
Improve error handling when parsing dirstat parameters
New --dirstat=lines mode, doing dirstat analysis based on diffstat
Allow specifying --dirstat cut-off percentage as a floating point number
Add config variable for specifying default --dirstat behavior
Refactor --dirstat parsing; deprecate --cumulative and --dirstat-by-file
Make --dirstat=0 output directories that contribute < 0.1% of changes
Add several testcases for --dirstat and friends
* jc/fix-diff-files-unmerged:
diff-files: show unmerged entries correctly
diff: remove often unused parameters from diff_unmerge()
diff.c: return filepair from diff_unmerge()
test: use $_z40 from test-lib
This patch adds an alternative implementation of show_dirstat(), called
show_dirstat_by_line(), which uses the more expensive diffstat analysis
(as opposed to show_dirstat()'s own (relatively inexpensive) analysis)
to derive the numbers from which the --dirstat output is computed.
The alternative implementation is controlled by the new "lines" parameter
to the --dirstat option (or the diff.dirstat config variable).
For binary files, the diffstat analysis counts bytes instead of lines,
so to prevent binary files from dominating the dirstat results, the
byte counts for binary files are divided by 64 before being compared to
their textual/line-based counterparts. This is a stupid and ugly - but
very cheap - heuristic.
In linux-2.6.git, running the three different --dirstat modes:
time git diff v2.6.20..v2.6.30 --dirstat=changes > /dev/null
vs.
time git diff v2.6.20..v2.6.30 --dirstat=lines > /dev/null
vs.
time git diff v2.6.20..v2.6.30 --dirstat=files > /dev/null
yields the following average runtimes on my machine:
- "changes" (default): ~6.0 s
- "lines": ~9.6 s
- "files": ~0.1 s
So, as expected, there's a considerable performance hit (~60%) by going
through the full diffstat analysis as compared to the default "changes"
analysis (obviously, "files" is much faster than both). As such, the
"lines" mode is probably only useful if you really need the --dirstat
numbers to be consistent with the numbers returned from the other
--*stat options.
The patch also includes documentation and tests for the new dirstat mode.
Improved-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Only the first digit after the decimal point is kept, as the dirstat
calculations all happen in permille.
Selftests verifying floating-point percentage input has been added.
Improved-by: Junio C Hamano <gitster@pobox.com>
Improved-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
e9c8409 (diff-index --cached --raw: show tree entry on the LHS for
unmerged entries., 2007-01-05) added a <mode, object name> pair as
parameters to this function, to store them in the pre-image side of an
unmerged file pair. Now the function is fixed to return the filepair it
queued, we can make the caller on the special case codepath to do so.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The underlying diff_queue() returns diff_filepair so that the caller can
further add information to it, and the helper function diff_unmerge()
utilizes the feature itself, but does not expose it to its callers, which
was kind of selfish.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reviewing a patch while concentrating primarily on the text after
then change, wading through pages of deleted text involves a cognitive
burden.
Introduce the -D option that omits the preimage text from the patch output
for deleted files. When used with -B (represent total rewrite as a single
wholesale deletion followed by a single wholesale addition), the preimage
text is also omitted.
To prevent such a patch from being applied by mistake, the output is
designed not to be usable by "git apply" (or GNU "patch"); it is strictly
for human consumption.
It of course is possible to "apply" such a patch by hand, as a human can
read the intention out of such a patch. It however is impossible to apply
such a patch even manually in reverse, as the whole point of this option
is to omit the information necessary to do so from the output.
Initial request by Mart Sõmermaa, documentation and tests helped by
Michael J Gruber.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When there are too many paths in the project, the number of rename source
candidates "git diff -C -C" finds will exceed the rename detection limit,
and no inexact rename detection is performed. We however could fall back
to "git diff -C" if the number of modified paths is sufficiently small.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We might spend many seconds doing inexact rename detection
with no output. It's nice to let the user know that
something is actually happening.
This patch adds the infrastructure, but no callers actually
turn on progress reporting.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The warning is generated deep in the diffcore code, which
means that it will come first, followed possibly by a spew
of conflicts, making it hard to see.
Instead, let's have diffcore pass back the information about
how big the rename limit would needed to have been, and then
the caller can provide a more appropriate message (and at a
more appropriate time).
No refactoring of other non-merge callers is necessary,
because nobody else was even using the warn_on_rename_limit
feature.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The recursive merge strategy turns on rename detection but leaves the
rename threshold at the default. Add a strategy option to allow the user
to specify a rename threshold to use.
Signed-off-by: Kevin Ballard <kevin@sb.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach "-G<regexp>" that is similar to "-S<regexp> --pickaxe-regexp" to the
"git diff" family of commands. This limits the diff queue to filepairs
whose patch text actually has an added or a deleted line that matches the
given regexp. Unlike "-S<regexp>", changing other parts of the line that
has a substring that matches the given regexp IS counted as a change, as
such a change would appear as one deletion followed by one addition in a
patch text.
Unlike -S (pickaxe) that is intended to be used to quickly detect a commit
that changes the number of occurrences of hits between the preimage and
the postimage to serve as a part of larger toolchain, this is meant to be
used as the top-level Porcelain feature.
The implementation unfortunately has to run "diff" twice if you are
running "log" family of commands to produce patches in the final output
(e.g. "git log -p" or "git format-patch"). I think we _could_ cache the
result in-core if we wanted to, but that would require larger surgery to
the diffcore machinery (i.e. adding an extra pointer in the filepair
structure to keep a pointer to a strbuf around, stuff the textual diff to
the strbuf inside diffgrep_consume(), and make use of it in later stages
when it is available) and it may not be worth it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mm/shortopt-detached:
log: parse separate option for --glob
log: parse separate options like git log --grep foo
diff: parse separate options --stat-width n, --stat-name-width n
diff: split off a function for --stat-* option parsing
diff: parse separate options like -S foo
Conflicts:
revision.c
* jc/maint-follow-rename-fix:
log: test for regression introduced in v1.7.2-rc0~103^2~2
diff --follow: do call diffcore_std() as necessary
diff --follow: do not waste cycles while recursing
Usually, diff frontends populate the output queue with filepairs without
any rename information and call diffcore_std() to sort the renames out.
When --follow is in effect, however, diff-tree family of frontend has a
hack that looks like this:
diff-tree frontend
-> diff_tree_sha1()
. populate diff_queued_diff
. if --follow is in effect and there is only one change that
creates the target path, then
-> try_to_follow_renames()
-> diff_tree_sha1() with no pathspec but with -C
-> diffcore_std() to find renames
. if rename is found, tweak diff_queued_diff and put a
single filepair that records the found rename there
-> diffcore_std()
. tweak elements on diff_queued_diff by
- rename detection
- path ordering
- pickaxe filtering
We need to skip parts of the second call to diffcore_std() that is related
to rename detection, and do so only when try_to_follow_renames() did find
a rename. Earlier 1da6175 (Make diffcore_std only can run once before a
diff_flush, 2010-05-06) tried to deal with this issue incorrectly; it
unconditionally disabled any second call to diffcore_std().
This hopefully fixes the breakage.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new "ignore" config option controls the default behavior for "git
status" and the diff family. It specifies under what circumstances they
consider submodules as modified and can be set separately for each
submodule.
The command line option "--ignore-submodules=" has been extended to accept
the new parameter "none" for both status and diff.
Users that chose submodules to get rid of long work tree scanning times
might want to set the "dirty" option for those submodules. This brings
back the pre 1.7.0 behavior, where submodule work trees were never
scanned for modifications. By using "--ignore-submodules=none" on the
command line the status and diff commands can be told to do a full scan.
This option can be set to the following values (which have the same name
and meaning as for the "--ignore-submodules" option of status and diff):
"all": All changes to the submodule will be ignored.
"dirty": Only differences of the commit recorded in the superproject and
the submodules HEAD will be considered modifications, all changes
to the work tree of the submodule will be ignored. When using this
value, the submodule will not be scanned for work tree changes at
all, leading to a performance benefit on large submodules.
"untracked": Only untracked files in the submodules work tree are ignored,
a changed HEAD and/or modified files in the submodule will mark it
as modified.
"none" (which is the default): Either untracked or modified files in a
submodules work tree or a difference between the subdmodules HEAD
and the commit recorded in the superproject will make it show up
as changed. This value is added as a new parameter for the
"--ignore-submodules" option of the diff family and "git status"
so the user can override the settings in the configuration.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the option parsing logic in revision.c to accept separate forms
like `-S foo' in addition to `-Sfoo'. The rest of git already accepted
this form, but revision.c still used its own option parsing.
Short options affected are -S<string>, -l<num> and -O<orderfile>, for
which an empty string wouldn't make sense, hence -<option> <arg> isn't
ambiguous.
This patch does not handle --stat-name-width and --stat-width, which are
special-cases where diff_long_opt do not apply. They are handled in a
separate patch to ease review.
Original patch by Matthieu Moy, plus refactoring by Jonathan Nieder.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* gv/portable:
test-lib: use DIFF definition from GIT-BUILD-OPTIONS
build: propagate $DIFF to scripts
Makefile: Tru64 portability fix
Makefile: HP-UX 10.20 portability fixes
Makefile: HPUX11 portability fixes
Makefile: SunOS 5.6 portability fix
inline declaration does not work on AIX
Allow disabling "inline"
Some platforms lack socklen_t type
Make NO_{INET_NTOP,INET_PTON} configured independently
Makefile: some platforms do not have hstrerror anywhere
git-compat-util.h: some platforms with mmap() lack MAP_FAILED definition
test_cmp: do not use "diff -u" on platforms that lack one
fixup: do not unconditionally disable "diff -u"
tests: use "test_cmp", not "diff", when verifying the result
Do not use "diff" found on PATH while building and installing
enums: omit trailing comma for portability
Makefile: -lpthread may still be necessary when libc has only pthread stubs
Rewrite dynamic structure initializations to runtime assignment
Makefile: pass CPPFLAGS through to fllow customization
Conflicts:
Makefile
wt-status.h
The textconv functionality allows one to convert a file into text before
running diff. But this functionality can be useful to other features
such as blame.
Signed-off-by: Axel Bonnet <axel.bonnet@ensimag.imag.fr>
Signed-off-by: Clément Poulain <clement.poulain@ensimag.imag.fr>
Signed-off-by: Diane Gasselin <diane.gasselin@ensimag.imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The callback can be used to add some prefix string to each line of
diff output.
Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Without this patch at least IBM VisualAge C 5.0 (I have 5.0.2) on AIX
5.1 fails to compile git.
enum style is inconsistent already, with some enums declared on one
line, some over 3 lines with the enum values all on the middle line,
sometimes with 1 enum value per line... and independently of that the
trailing comma is sometimes present and other times absent, often
mixing with/without trailing comma styles in a single file, and
sometimes in consecutive enum declarations.
Clearly, omitting the comma is the more portable style, and this patch
changes all enum declarations to use the portable omitted dangling
comma style consistently.
Signed-off-by: Gary V. Vaughan <gary@thewrittenword.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This teaches the --color-words engine a more general interface that
supports two new modes:
* --word-diff=plain, inspired by the 'wdiff' utility (most similar to
'wdiff -n <old> <new>'): uses delimiters [-removed-] and {+added+}
* --word-diff=porcelain, which generates an ad-hoc machine readable
format:
- each diff unit is prefixed by [-+ ] and terminated by newline as
in unified diff
- newlines in the input are output as a line consisting only of a
tilde '~'
Both of these formats still support color if it is enabled, using it
to highlight the differences. --color-words becomes a synonym for
--word-diff=color, which is the color-only format. Also adds some
compatibility/convenience options.
Thanks to Junio C Hamano and Miles Bader for good ideas.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 1.7.0 submodules are considered dirty when they contain untracked
files. But when git status is called with the "-uno" option, the user
asked to ignore untracked files, so they must be ignored in submodules
too. To achieve this, the new flag DIFF_OPT_IGNORE_UNTRACKED_IN_SUBMODULES
is introduced.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 1.7.0 there are three reasons a submodule is considered modified
against the work tree: It contains new commits, modified content or
untracked content. Lets show all reasons in the long format of git status,
so the user can better asses the nature of the modification. This change
does not affect the short and porcelain formats.
Two new members are added to "struct wt_status_change_data" to store the
information gathered by run_diff_files(). wt-status.c uses the new flag
DIFF_OPT_DIRTY_SUBMODULES to tell diff-lib.c it wants to get detailed
dirty information about submodules.
A hint line for submodules is printed in the dirty header when dirty
submodules are present.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/fix-tree-walk:
read-tree --debug-unpack
unpack-trees.c: look ahead in the index
unpack-trees.c: prepare for looking ahead in the index
Aggressive three-way merge: fix D/F case
traverse_trees(): handle D/F conflict case sanely
more D/F conflict tests
tests: move convenience regexp to match object names to test-lib.sh
Conflicts:
builtin-read-tree.c
unpack-trees.c
unpack-trees.h
In the worst case is_submodule_modified() got called three times for
each submodule. The information we got from scanning the whole
submodule tree the first time can be reused instead.
New parameters have been added to diff_change() and diff_addremove(),
the information is stored in a new member of struct diff_filespec. Its
value is then reused instead of calling is_submodule_modified() again.
When no explicit "-dirty" is needed in the output the call to
is_submodule_modified() is not necessary when the submodules HEAD
already disagrees with the ref of the superproject, as this alone
marks it as modified. To achieve that, get_stat_data() got an extra
argument.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes the traversal of index be in sync with the tree traversal.
When unpack_callback() is fed a set of tree entries from trees, it
inspects the name of the entry and checks if the an index entry with
the same name could be hiding behind the current index entry, and
(1) if the name appears in the index as a leaf node, it is also
fed to the n_way_merge() callback function;
(2) if the name is a directory in the index, i.e. there are entries in
that are underneath it, then nothing is fed to the n_way_merge()
callback function;
(3) otherwise, if the name comes before the first eligible entry in the
index, the index entry is first unpacked alone.
When traverse_trees_recursive() descends into a subdirectory, the
cache_bottom pointer is moved to walk index entries within that directory.
All of these are omitted for diff-index, which does not even want to be
fed an index entry and a tree entry with D/F conflicts.
This fixes 3-way read-tree and exposes a bug in other parts of the system
in t6035, test #5. The test prepares these three trees:
O = HEAD^
100644 blob e69de29bb2 a/b-2/c/d
100644 blob e69de29bb2 a/b/c/d
100644 blob e69de29bb2 a/x
A = HEAD
100644 blob e69de29bb2 a/b-2/c/d
100644 blob e69de29bb2 a/b/c/d
100644 blob 587be6b4c3f93f93c489c0111bba5596147a26cb a/x
B = master
120000 blob a36b77384451ea1de7bd340ffca868249626bc52 a/b
100644 blob e69de29bb2 a/b-2/c/d
100644 blob e69de29bb2 a/x
With a clean index that matches HEAD, running
git read-tree -m -u --aggressive $O $A $B
now yields
120000 a36b77384451ea1de7bd340ffca868249626bc52 3 a/b
100644 e69de29bb2 0 a/b-2/c/d
100644 e69de29bb2 1 a/b/c/d
100644 e69de29bb2 2 a/b/c/d
100644 587be6b4c3f93f93c489c0111bba5596147a26cb 0 a/x
which is correct. "master" created "a/b" symlink that did not exist,
and removed "a/b/c/d" while HEAD did not do touch either path.
Before this series, read-tree did not notice the situation and resolved
addition of "a/b" and removal of "a/b/c/d" independently. If A = HEAD had
another path "a/b/c/e" added, this merge should conflict but instead it
silently resolved "a/b" and then immediately overwrote it to add
"a/b/c/e", which was quite bogus.
Tests in t1012 start to work with this.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/1.7.0-diff-whitespace-only-status:
diff.c: fix typoes in comments
Make test case number unique
diff: Rename QUIET internal option to QUICK
diff: change semantics of "ignore whitespace" options
Conflicts:
diff.h
Inspired by the coloring of quilt.
Introduce a separate color and paint the hunk comment part, i.e. the name
of the function, in a separate color "diff.func" (defaults to plain).
Whitespace between hunk header and hunk comment is printed in plain color.
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When you use the option --submodule=log you can see the submodule
summaries inlined in the diff, instead of not-quite-helpful SHA-1 pairs.
The format imitates what "git submodule summary" shows.
To do that, <path>/.git/objects/ is added to the alternate object
databases (if that directory exists).
This option was requested by Jens Lehmann at the GitTogether in Berlin.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The option "QUIET" primarily meant "find if we have _any_ difference as
quick as possible and report", which means we often do not even have to
look at blobs if we know the trees are different by looking at the higher
level (e.g. "diff-tree A B"). As a side effect, because there is no point
showing one change that we happened to have found first, it also enables
NO_OUTPUT and EXIT_WITH_STATUS options, making the end result look quiet.
Rename the internal option to QUICK to reflect this better; it also makes
grepping the source tree much easier, as there are other kinds of QUIET
option everywhere.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Traditionally, the --ignore-whitespace* options have merely meant to tell
the diff output routine that some class of differences are not worth
showing in the textual diff output, so that the end user has easier time
to review the remaining (presumably more meaningful) changes. These
options never affected the outcome of the command, given as the exit
status when the --exit-code option was in effect (either directly or
indirectly).
When you have only whitespace changes, however, you might expect
git diff -b --exit-code
to report that there is _no_ change with zero exit status.
Change the semantics of --ignore-whitespace* options to mean more than
"omit showing the difference in text".
The exit status, when --exit-code is in effect, is computed by checking if
we found any differences at the path level, while diff frontends feed
filepairs to the diffcore engine. When "ignore whitespace" options are in
effect, we defer this determination until the very end of diffcore
transformation. We simply do not know until the textual diff is
generated, which comes very late in the pipeline.
When --quiet is in effect, various diff frontends optimize by breaking out
early from the loop that enumerates the filepairs, when we find the first
path level difference; when --ignore-whitespace* is used the above change
automatically disables this optimization.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
index_is_dirty() in builtin-revert.c checks if the index is dirty.
This patch generalizes this function to check if the index differs
from a revision, i.e. the former index_is_dirty() behavior can now be
achieved by index_differs_from("HEAD", 0).
The second argument "diff_flags" allows to set further diff option
flags like DIFF_OPT_IGNORE_SUBMODULES. See DIFF_OPT_* macros in diff.h
for a list.
index_differs_from() seems to be useful for more than builtin-revert.c,
so it is moved into diff-lib.c and also used in builtin-commit.c.
Yet to mention:
- "rev.abbrev = 0;" can be safely removed.
This has no impact on performance or functioning of neither
setup_revisions() nor run_diff_index().
- rev.pending.objects is free()d because this fixes a leak.
(Also see 295dd2ad "Fix memory leak in traverse_commit_list")
Mentored-by: Daniel Barkalow <barkalow@iabervon.org>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Stephan Beyer <s-beyer@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In some applications, words are not delimited by white space. To
allow for that, you can specify a regular expression describing
what makes a word with
git diff --color-words='[A-Za-z0-9]+'
Note that words cannot contain newline characters.
As suggested by Thomas Rast, the words are the exact matches of the
regular expression.
Note that a regular expression beginning with a '^' will match only
a word at the beginning of the hunk, not a word at the beginning of
a line, and is probably not what you want.
This commit contains a quoting fix by Thomas Rast.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Merge two hunks if there is only the specified number of otherwise unshown
context between them. For --inter-hunk-context=1, the resulting patch has
the same number of lines but shows uninterrupted context instead of a
context header line in between.
Patches generated with this option are easier to read but are also more
likely to conflict if the file to be patched contains other changes.
This patch keeps the default for this option at 0. It is intended to just
make the feature available in order to see its advantages and downsides.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>