Commit graph

60 commits

Author SHA1 Message Date
Nguyễn Thái Ngọc Duy bd1928df1d remove diff_tree_{setup,release}_paths
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:09 -07:00
Nguyễn Thái Ngọc Duy 8f4f8f4579 guard against new pathspec magic in pathspec matching code
GUARD_PATHSPEC() marks pathspec-sensitive code, basically all those
that touch anything in 'struct pathspec' except fields "nr" and
"original". GUARD_PATHSPEC() is not supposed to fail. It's mainly to
help the designers catch unsupported codepaths.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:07 -07:00
Nguyễn Thái Ngọc Duy 6330a17199 parse_pathspec: add special flag for max_depth feature
match_pathspec_depth() and tree_entry_interesting() check max_depth
field in order to support "git grep --max-depth". The feature
activation is tied to "recursive" field, which led to some unwanted
activation, e.g. 5c8eeb8 (diff-index: enable recursive pathspec
matching in unpack_trees - 2012-01-15).

This patch decouples the activation from "recursive" field, puts it in
"magic" field instead. This makes sure that only "git grep" can
activate this feature. And because parse_pathspec knows when the
feature is not used, it does not need to sort pathspec (required for
max_depth to work correctly). A small win for non-grep cases.

Even though a new magic flag is introduced, no magic syntax is. The
magic can be only enabled by parse_pathspec() caller. We might someday
want to support ":(maxdepth:10)src." It all depends on actual use
cases.

max_depth feature cannot be enabled via init_pathspec() anymore. But
that's ok because init_pathspec() is on its way to /dev/null.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:06 -07:00
Junio C Hamano 3b753148b6 Merge branch 'jk/maint-null-in-trees'
We do not want a link to 0{40} object stored anywhere in our objects.

* jk/maint-null-in-trees:
  fsck: detect null sha1 in tree entries
  do not write null sha1s to on-disk index
  diff: do not use null sha1 as a sentinel value
2012-08-27 11:54:28 -07:00
Thomas Rast 28452655af diff_setup_done(): return void
diff_setup_done() has historically returned an error code, but lost
the last nonzero return in 943d5b7 (allow diff.renamelimit to be set
regardless of -M/-C, 2006-08-09).  The callers were in a pretty
confused state: some actually checked for the return code, and some
did not.

Let it return void, and patch all callers to take this into account.
This conveniently also gets rid of a handful of different(!) error
messages that could never be triggered anyway.

Note that the function can still die().

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-03 12:11:07 -07:00
Jeff King e54501004a diff: do not use null sha1 as a sentinel value
The diff code represents paths using the diff_filespec
struct. This struct has a sha1 to represent the sha1 of the
content at that path, as well as a sha1_valid member which
indicates whether its sha1 field is actually useful. If
sha1_valid is not true, then the filespec represents a
working tree file (e.g., for the no-index case, or for when
the index is not up-to-date).

The diff_filespec is only used internally, though. At the
interfaces to the diff subsystem, callers feed the sha1
directly, and we create a diff_filespec from it. It's at
that point that we look at the sha1 and decide whether it is
valid or not; callers may pass the null sha1 as a sentinel
value to indicate that it is not.

We should not typically see the null sha1 coming from any
other source (e.g., in the index itself, or from a tree).
However, a corrupt tree might have a null sha1, which would
cause "diff --patch" to accidentally diff the working tree
version of a file instead of treating it as a blob.

This patch extends the edges of the diff interface to accept
a "sha1_valid" flag whenever we accept a sha1, and to use
that flag when creating a filespec. In some cases, this
means passing the flag through several layers, making the
code change larger than would be desirable.

One alternative would be to simply die() upon seeing
corrupted trees with null sha1s. However, this fix more
directly addresses the problem (while bogus sha1s in a tree
are probably a bad thing, it is really the sentinel
confusion sending us down the wrong code path that is what
makes it devastating). And it means that git is more capable
of examining and debugging these corrupted trees. For
example, you can still "diff --raw" such a tree to find out
when the bogus entry was introduced; you just cannot do a
"--patch" diff (just as you could not with any other
corrupted tree, as we do not have any content to diff).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-29 15:04:32 -07:00
Jeff King dd98d88be7 use custom rename score during --follow
If you provide a custom rename score on the command line,
like:

  git log -M50 --follow foo.c

it is completely ignored, and there is no way to --follow
with a looser rename score. Instead, let's use the same
rename score that will be used for generating diffs. This is
convenient, and mirrors what we do with the break-score.

You can see an example of it being useful in git.git:

  $ git log --oneline --summary --follow \
	    Documentation/technical/api-string-list.txt
  86d4b52 string-list: Add API to remove an item from an unsorted list
  1d2f80f string_list: Fix argument order for string_list_append
  e242148 string-list: add unsorted_string_list_lookup()
  0dda1d1 Fix two leftovers from path_list->string_list
  c455c87 Rename path_list to string_list
   create mode 100644 Documentation/technical/api-string-list.txt

  $ git log --oneline --summary -M40 --follow \
	  Documentation/technical/api-string-list.txt
  86d4b52 string-list: Add API to remove an item from an unsorted list
  1d2f80f string_list: Fix argument order for string_list_append
  e242148 string-list: add unsorted_string_list_lookup()
  0dda1d1 Fix two leftovers from path_list->string_list
  c455c87 Rename path_list to string_list
   rename Documentation/technical/{api-path-list.txt => api-string-list.txt} (47%)
  328a475 path-list documentation: document all functions and data structures
  530e741 Start preparing the API documents.
   create mode 100644 Documentation/technical/api-path-list.txt

You could have two separate rename scores, one for following
and one for diff. But almost nobody is going to want that,
and it would just be unnecessarily confusing. Besides which,
we re-use the diff results from try_to_follow_renames for
the actual diff output, which means having them as separate
scores is actively wrong. E.g., with the current code, you
get:

  $ git log --oneline --diff-filter=R --name-status \
            -M90 --follow git.spec.in
  27dedf0 GIT 0.99.9j aka 1.0rc3
  R084    git-core.spec.in        git.spec.in
  f85639c Rename the RPM from "git" to "git-core"
  R098    git.spec.in     git-core.spec.in

The first one should not be considered a rename by the -M
score we gave, but we print it anyway, since we blindly
re-use the diff information from the follow (which uses the
default score). So this could also be considered simply a
bug-fix, as with the current code "-M" is completely ignored
when using "--follow".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-16 12:33:49 -08:00
Nguyễn Thái Ngọc Duy d688cf07b1 tree_entry_interesting(): give meaningful names to return values
It is a basic code hygiene to avoid magic constants that are unnamed.
Besides, this helps extending the value later on for "interesting, but
cannot decide if the entry truely matches yet" (ie. prefix matches)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-27 11:38:24 -07:00
Nguyễn Thái Ngọc Duy 0de1633783 tree-walk.c: do not leak internal structure in tree_entry_len()
tree_entry_len() does not simply take two random arguments and return
a tree length. The two pointers must point to a tree item structure,
or struct name_entry. Passing random pointers will return incorrect
value.

Force callers to pass struct name_entry instead of two pointers (with
hope that they don't manually construct struct name_entry themselves)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-27 11:08:26 -07:00
Junio C Hamano 456a4c08b8 Merge branch 'jk/diff-not-so-quick'
* jk/diff-not-so-quick:
  diff: futureproof "stop feeding the backend early" logic
  diff_tree: disable QUICK optimization with diff filter

Conflicts:
	diff.c
2011-06-06 11:40:14 -07:00
Junio C Hamano 28b9264dd6 diff: futureproof "stop feeding the backend early" logic
Refactor the "do not stop feeding the backend early" logic into a small
helper function and use it in both run_diff_files() and diff_tree() that
has the stop-early optimization. We may later add other types of diffcore
transformation that require to look at the whole result like diff-filter
does, and having the logic in a single place is essential for longer term
maintainability.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-31 09:21:36 -07:00
Jeff King af7b41c923 diff_tree: disable QUICK optimization with diff filter
We stop looking for changes early with QUICK, so our diff
queue contains only a subset of the changes. However, we
don't apply diff filters until later; it will appear at that
point as though there are no changes matching our filter,
when in reality we simply didn't keep looking for changes
long enough.

Commit 2cfe8a6 (diff --quiet: disable optimization when
--diff-filter=X is used, 2011-03-16) fixes this in some
cases by disabling the optimization when a filter is
present. However, it only tweaked run_diff_files, missing
the similar case in diff_tree. Thus the fix worked only for
diffing the working tree and index, but not between trees.

Noticed by Yasushi SHOJI.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-31 09:20:31 -07:00
Junio C Hamano 1273738f05 Merge branch 'nd/struct-pathspec'
* nd/struct-pathspec:
  pathspec: rename per-item field has_wildcard to use_wildcard
  Improve tree_entry_interesting() handling code
  Convert read_tree{,_recursive} to support struct pathspec
  Reimplement read_tree_recursive() using tree_entry_interesting()
2011-05-06 10:50:06 -07:00
Nguyễn Thái Ngọc Duy 97d0b74a49 Improve tree_entry_interesting() handling code
t_e_i() can return -1 or 2 to early shortcut a search. Current code
may use up to two variables to handle it. One for saving return value
from t_e_i temporarily, one for saving return code 2.

The second variable is not needed. If we make sure the first variable
does not change until the next t_e_i() call, then we can do something
like this:

int ret = 0;

while (...) {
	if (ret != 2) {
		ret = t_e_i();
		if (ret < 0) /* no longer interesting */
			break;
		if (ret == 0) /* skip this round */
			continue;
	}
	/* ret > 0, interesting */
}

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-25 09:20:33 -07:00
Johannes Schindelin c0aa335c95 Remove unused variables
Noticed by gcc 4.6.0.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-22 11:43:27 -07:00
Nguyễn Thái Ngọc Duy 1376e50723 grep: drop pathspec_matches() in favor of tree_entry_interesting()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03 14:08:31 -08:00
Nguyễn Thái Ngọc Duy bc96cc87db tree_entry_interesting(): support depth limit
This is needed to replace pathspec_matches() in builtin/grep.c.

max_depth == -1 means infinite depth. Depth limit is only effective
when pathspec.recursive == 1. When pathspec.recursive == 0, the
behavior depends on match functions: non-recursive for
tree_entry_interesting() and recursive for match_pathspec{,_depth}

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03 14:08:30 -08:00
Nguyễn Thái Ngọc Duy 48932677d6 diff-tree: convert base+baselen to writable strbuf
In traversing trees, a full path is splitted into two parts: base
directory and entry. They are however quite often concatenated
whenever a full path is needed. Current code allocates a new buffer,
do two memcpy(), use it, then release.

Instead this patch turns "base" to a writable, extendable buffer. When
a concatenation is needed, the callee only needs to append "entry" to
base, use it, then truncate the entry out again. "base" must remain
unchanged before and after entering a function.

This avoids quite a bit of malloc() and memcpy().

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03 14:08:16 -08:00
Nguyễn Thái Ngọc Duy 2c389fc8ec Move tree_entry_interesting() to tree-walk.c and export it
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03 13:22:58 -08:00
Nguyễn Thái Ngọc Duy 475005a117 tree_entry_interesting(): remove dependency on struct diff_options
This function can be potentially used in more places than just
tree-diff.c. "struct diff_options" does not make much sense outside
diff_tree_sha1().

While removing the use of diff_options, it also removes
tree_entry_extract() call, which means S_ISDIR() uses the entry->mode
directly, without being filtered by canon_mode() (called internally
inside tree_entry_extract).

The only use of the mode information in this function is to check the
type of the entry by giving it to S_ISDIR() macro, and the result does
not change with or without canon_mode(), so it is ok to bypass
tree_entry_extract().

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03 13:04:23 -08:00
Nguyễn Thái Ngọc Duy 66f136252f Convert struct diff_options to use struct pathspec
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03 12:28:15 -08:00
Junio C Hamano acbaf54f30 Merge branch 'en/tree-walk-optim'
* en/tree-walk-optim:
  diff_tree(): Skip skip_uninteresting() when all remaining paths interesting
  tree_entry_interesting(): Make return value more specific
  tree-walk: Correct bitrotted comment about tree_entry()
  Document pre-condition for tree_entry_interesting
2010-10-26 21:37:49 -07:00
Elijah Newren 7e1ec0d415 diff_tree(): Skip skip_uninteresting() when all remaining paths interesting
In 1d848f6 (tree_entry_interesting(): allow it to say "everything is
interesting" 2007-03-21), both show_tree() and skip_uninteresting() were
modified to determine if all remaining tree entries were interesting.
However, the latter returns as soon as it finds the first interesting path,
without any way to signal to its caller (namely, diff_tree()) that all
remaining paths are interesting, making these extra checks useless.

Pass whether all remaining entries are interesting back to diff_tree(), and
whenever they are, have diff_tree() skip subsequent calls to
skip_uninteresting().

With this change, I measure speedups of 3-4% for the commands

  $ git rev-list --quiet HEAD -- Documentation/
  $ git rev-list --quiet HEAD -- t/

in git.git.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-26 09:53:36 -07:00
Elijah Newren 4a5e74feb1 tree_entry_interesting(): Make return value more specific
tree_entry_interesting() can signal to its callers not only if the given
entry matches one of the specified paths, but whether all remaining paths
will (or will not) match.  When no paths are specified, all paths are
considered interesting, so intead of returning 1 (this path is interesting)
return 2 (all paths are interesting).

This will allow the caller to avoid calling tree_entry_interesting() again,
which theoretically should speed up tree walking.  I am not able to measure
any actual gains in practice, but it certainly can not hurt and seems to
make the code more readable to me.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-26 09:53:36 -07:00
Elijah Newren b6b987a094 Document pre-condition for tree_entry_interesting
tree_entry_interesting will fail to find appropriate matches if the base
directory path is not terminated with a slash.  Knowing this earlier would
have saved me some debugging time.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-26 09:53:35 -07:00
Junio C Hamano 44c48a909a diff --follow: do call diffcore_std() as necessary
Usually, diff frontends populate the output queue with filepairs without
any rename information and call diffcore_std() to sort the renames out.
When --follow is in effect, however, diff-tree family of frontend has a
hack that looks like this:

    diff-tree frontend
    -> diff_tree_sha1()
       . populate diff_queued_diff
       . if --follow is in effect and there is only one change that
         creates the target path, then
       -> try_to_follow_renames()
	  -> diff_tree_sha1() with no pathspec but with -C
	  -> diffcore_std() to find renames
	  . if rename is found, tweak diff_queued_diff and put a
	    single filepair that records the found rename there
    -> diffcore_std()
       . tweak elements on diff_queued_diff by
       - rename detection
       - path ordering
       - pickaxe filtering

We need to skip parts of the second call to diffcore_std() that is related
to rename detection, and do so only when try_to_follow_renames() did find
a rename.  Earlier 1da6175 (Make diffcore_std only can run once before a
diff_flush, 2010-05-06) tried to deal with this issue incorrectly; it
unconditionally disabled any second call to diffcore_std().

This hopefully fixes the breakage.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-13 12:17:45 -07:00
Junio C Hamano 39f75d26e2 diff --follow: do not waste cycles while recursing
The "--follow" logic is called from diff_tree_sha1() function, but the
input trees to diff_tree_sha1() are not necessarily the top-level trees
(compare_tree_entry() calls it while it recursively descends into
subtrees).  When a newly created path lives in somewhere deep in the
source hierarchy, e.g. "platform/", but the rename source is in a totally
different place in the destination hierarchy, e.g. "lang-api/src/com/...",
running "try_to_find_renames()" while base is set to "platform/" is a
wasted call.

We only need to run the rename following at the very top level.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-13 11:01:51 -07:00
Bo Yang 0cdca133ec Make git log --follow find copies among unmodified files.
'git log --follow <path>' don't track copies from unmodified
files, and this patch fix it.

Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-07 09:34:29 -07:00
Jens Lehmann e3d42c4773 Performance optimization for detection of modified submodules
In the worst case is_submodule_modified() got called three times for
each submodule. The information we got from scanning the whole
submodule tree the first time can be reused instead.

New parameters have been added to diff_change() and diff_addremove(),
the information is stored in a new member of struct diff_filespec. Its
value is then reused instead of calling is_submodule_modified() again.

When no explicit "-dirty" is needed in the output the call to
is_submodule_modified() is not necessary when the submodules HEAD
already disagrees with the ref of the superproject, as this alone
marks it as modified. To achieve that, get_stat_data() got an extra
argument.

Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-18 17:28:21 -08:00
Junio C Hamano 90b1994170 diff: Rename QUIET internal option to QUICK
The option "QUIET" primarily meant "find if we have _any_ difference as
quick as possible and report", which means we often do not even have to
look at blobs if we know the trees are different by looking at the higher
level (e.g. "diff-tree A B").  As a side effect, because there is no point
showing one change that we happened to have found first, it also enables
NO_OUTPUT and EXIT_WITH_STATUS options, making the end result look quiet.

Rename the internal option to QUICK to reflect this better; it also makes
grepping the source tree much easier, as there are other kinds of QUIET
option everywhere.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-29 10:22:39 -07:00
Junio C Hamano f245194f9a diff: change semantics of "ignore whitespace" options
Traditionally, the --ignore-whitespace* options have merely meant to tell
the diff output routine that some class of differences are not worth
showing in the textual diff output, so that the end user has easier time
to review the remaining (presumably more meaningful) changes.  These
options never affected the outcome of the command, given as the exit
status when the --exit-code option was in effect (either directly or
indirectly).

When you have only whitespace changes, however, you might expect

	git diff -b --exit-code

to report that there is _no_ change with zero exit status.

Change the semantics of --ignore-whitespace* options to mean more than
"omit showing the difference in text".

The exit status, when --exit-code is in effect, is computed by checking if
we found any differences at the path level, while diff frontends feed
filepairs to the diffcore engine.  When "ignore whitespace" options are in
effect, we defer this determination until the very end of diffcore
transformation.  We simply do not know until the textual diff is
generated, which comes very late in the pipeline.

When --quiet is in effect, various diff frontends optimize by breaking out
early from the loop that enumerates the filepairs, when we find the first
path level difference; when --ignore-whitespace* is used the above change
automatically disables this optimization.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-29 10:22:39 -07:00
Junio C Hamano 4197195bee Merge branch 'ne/maint-1.6.0-diff-tree-t-r-show-directory'
* ne/maint-1.6.0-diff-tree-t-r-show-directory:
  diff-tree -r -t: include added/removed directories in the output
2009-07-01 19:40:47 -07:00
Nick Edelen df533f34a3 diff-tree -r -t: include added/removed directories in the output
We used to include only the modified and typechanged directories
in the ouptut, but for consistency's sake, we should also include
added and removed ones as well.

This makes the output more consistent, but it may break existing scripts
that expect to see the current output which has long been the established
behaviour.

Signed-off-by: Nick Edelen <sirnot@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-13 17:06:09 -07:00
Mike Ralphson 3ea3c215c0 Fix typos / spelling in comments
Signed-off-by: Mike Ralphson <mike@abacus.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-22 19:02:12 -07:00
Björn Steinbrink f0946cb826 tree_entry_interesting: a pathspec only matches at directory boundary
Previously the code did a simple prefix match, which means that a
path in a directory "frotz/" would have matched with pathspec "f".

Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-01 19:35:16 -07:00
Junio C Hamano 7e44c93558 'git foo' program identifies itself without dash in die() messages
This is a mechanical conversion of all '*.c' files with:

	s/((?:die|error|warning)\("git)-(\S+:)/$1 $2/;

The result was manually inspected and no false positive was found.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-31 09:39:19 -07:00
Dmitry Potapov fd55a19eb1 Fix buffer overflow in git diff
If PATH_MAX on your system is smaller than a path stored, it may cause
buffer overflow and stack corruption in diff_addremove() and diff_change()
functions when running git-diff

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-16 14:03:24 -07:00
Mike Hommey 03b69c7606 Fix small memory leaks induced by diff_tree_setup_paths
Run diff_tree_release_paths in the appropriate places, and add a test to
avoid NULL dereference. Better safe than sorry.

Signed-off-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-12-12 10:59:22 -08:00
Pierre Habouzit 8f67f8aefb Make the diff_options bitfields be an unsigned with explicit masks.
reverse_diff was a bit-value in disguise, it's merged in the flags now.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-11 16:54:15 -08:00
Linus Torvalds 6dd4b66fde Fix diffcore-break total breakage
Ok, so on the kernel list, some people noticed that "git log --follow"
doesn't work too well with some files in the x86 merge, because a lot of
files got renamed in very special ways.

In particular, there was a pattern of doing single commits with renames
that looked basically like

 - rename "filename.h" -> "filename_64.h"
 - create new "filename.c" that includes "filename_32.h" or
   "filename_64.h" depending on whether we're 32-bit or 64-bit.

which was preparatory for smushing the two trees together.

Now, there's two issues here:

 - "filename.c" *remained*. Yes, it was a rename, but there was a new file
   created with the old name in the same commit. This was important,
   because we wanted each commit to compile properly, so that it was
   bisectable, so splitting the rename into one commit and the "create
   helper file" into another was *not* an option.

   So we need to break associations where the contents change too much.
   Fine. We have the -B flag for that. When we break things up, then the
   rename detection will be able to figure out whether there are better
   alternatives.

 - "git log --follow" didn't with with -B.

Now, the second case was really simple: we use a different "diffopt"
structure for the rename detection than the basic one (which we use for
showing the diffs). So that second case is trivially fixed by a trivial
one-liner that just copies the break_opt values from the "real" diffopts
to the one used for rename following. So now "git log -B --follow" works
fine:

	diff --git a/tree-diff.c b/tree-diff.c
	index 26bdbdd..7c261fd 100644
	--- a/tree-diff.c
	+++ b/tree-diff.c
	@@ -319,6 +319,7 @@ static void try_to_follow_renames(struct tree_desc *t1, struct tree_desc *t2, co
	 	diff_opts.detect_rename = DIFF_DETECT_RENAME;
	 	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
	 	diff_opts.single_follow = opt->paths[0];
	+	diff_opts.break_opt = opt->break_opt;
	 	paths[0] = NULL;
	 	diff_tree_setup_paths(paths, &diff_opts);
	 	if (diff_setup_done(&diff_opts) < 0)

however, the end result does *not* work. Because our diffcore-break.c
logic is totally bogus!

In particular:

 - it used to do

	if (base_size < MINIMUM_BREAK_SIZE)
		return 0; /* we do not break too small filepair */

   which basically says "don't bother to break small files". But that
   "base_size" is the *smaller* of the two sizes, which means that if some
   large file was rewritten into one that just includes another file, we
   would look at the (small) result, and decide that it's smaller than the
   break size, so it cannot be worth it to break it up! Even if the other
   side was ten times bigger and looked *nothing* like the samell file!

   That's clearly bogus. I replaced "base_size" with "max_size", so that
   we compare the *bigger* of the filepair with the break size.

 - It calculated a "merge_score", which was the score needed to merge it
   back together if nothing else wanted it. But even if it was *so*
   different that we would never want to merge it back, we wouldn't
   consider it a break! That makes no sense. So I added

	if (*merge_score_p > break_score)
		return 1;

   to make it clear that if we wouldn't want to merge it at the end, it
   was *definitely* a break.

 - It compared the whole "extent of damage", counting all inserts and
   deletes, but it based this score on the "base_size", and generated the
   damage score with

	delta_size = src_removed + literal_added;
	damage_score = delta_size * MAX_SCORE / base_size;

   but that makes no sense either, since quite often, this will result in
   a number that is *bigger* than MAX_SCORE! Why? Because base_size is
   (again) the smaller of the two files we compare, and when you start out
   from a small file and add a lot (or start out from a large file and
   remove a lot), the base_size is going to be much smaller than the
   damage!

   Again, the fix was to replace "base_size" with "max_size", at which
   point the damage actually becomes a sane percentage of the whole.

With these changes in place, not only does "git log -B --follow" work for
the case that triggered this in the first place, ie now

	git log -B --follow arch/x86/kernel/vmlinux_64.lds.S

actually gives reasonable results. But I also wanted to verify it in
general, by doing a full-history

	git log --stat -B -C

on my kernel tree with the old code and the new code.

There's some tweaking to be done, but generally, the new code generates
much better results wrt breaking up files (and then finding better rename
candidates). Here's a few examples of the "--stat" output:

 - This:
	include/asm-x86/Kbuild        |    2 -
	include/asm-x86/debugreg.h    |   79 +++++++++++++++++++++++++++++++++++------
	include/asm-x86/debugreg_32.h |   64 ---------------------------------
	include/asm-x86/debugreg_64.h |   65 ---------------------------------
	4 files changed, 68 insertions(+), 142 deletions(-)

      Becomes:

	include/asm-x86/Kbuild                        |    2 -
	include/asm-x86/{debugreg_64.h => debugreg.h} |    9 +++-
	include/asm-x86/debugreg_32.h                 |   64 -------------------------
	3 files changed, 7 insertions(+), 68 deletions(-)

 - This:
	include/asm-x86/bug.h    |   41 +++++++++++++++++++++++++++++++++++++++--
	include/asm-x86/bug_32.h |   37 -------------------------------------
	include/asm-x86/bug_64.h |   34 ----------------------------------
	3 files changed, 39 insertions(+), 73 deletions(-)

      Becomes

	include/asm-x86/{bug_64.h => bug.h} |   20 +++++++++++++-----
	include/asm-x86/bug_32.h            |   37 -----------------------------------
	2 files changed, 14 insertions(+), 43 deletions(-)

Now, in some other cases, it does actually turn a rename into a real
"delete+create" pair, and then the diff is usually bigger, so truth in
advertizing: it doesn't always generate a nicer diff. But for what -B was
meant for, I think this is a big improvement, and I suspect those cases
where it generates a bigger diff are tweakable.

So I think this diff fixes a real bug, but we might still want to tweak
the default values and perhaps the exact rules for when a break happens.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 01:59:42 -04:00
Linus Torvalds 9f38e1ef7e Fix up "git log --follow" a bit..
This fixes "git log --follow" to hopefully not leak memory any more, and
also cleans it up a bit to look more like some of the other functions that
use "diff_queued_diff" (by *not* using it directly as a global in the
code, but by instead just taking a pointer to the diff queue and using
that).

As to "diff_queued_diff", I think it would be better off not as a global
at all, but as being just an entry in the "struct diff_options" structure,
but that's a separate issue, and there may be some subtle reason for why
it's currently a global.

Anyway, no real changes. Instead of having a magical first entry in the
diff-queue, we now end up just keeping the diff-queue clean, and keeping
our "preferred" file pairing in an internal "choice" variable. That makes
it easy to switch the choice around when we find a better one.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-22 23:37:21 -07:00
Linus Torvalds 750f7b668f Finally implement "git log --follow"
Ok, I've really held off doing this too damn long, because I'm lazy, and I
was always hoping that somebody else would do it.

But no, people keep asking for it, but nobody actually did anything, so I
decided I might as well bite the bullet, and instead of telling people
they could add a "--follow" flag to "git log" to do what they want to do,
I decided that it looks like I just have to do it for them..

The code wasn't actually that complicated, in that the diffstat for this
patch literally says "70 insertions(+), 1 deletions(-)", but I will have
to admit that in order to get to this fairly simple patch, you did have to
know and understand the internal git diff generation machinery pretty
well, and had to really be able to follow how commit generation interacts
with generating patches and generating the log.

So I suspect that while I was right that it wasn't that hard, I might have
been expecting too much of random people - this patch does seem to be
firmly in the core "Linus or Junio" territory.

To make a long story short: I'm sorry for it taking so long until I just
did it.

I'm not going to guarantee that this works for everybody, but you really
can just look at the patch, and after the appropriate appreciative noises
("Ooh, aah") over how clever I am, you can then just notice that the code
itself isn't really that complicated.

All the real new code is in the new "try_to_follow_renames()" function. It
really isn't rocket science: we notice that the pathname we were looking
at went away, so we start a full tree diff and try to see if we can
instead make that pathname be a rename or a copy from some other previous
pathname. And if we can, we just continue, except we show *that*
particular diff, and ever after we use the _previous_ pathname.

One thing to look out for: the "rename detection" is considered to be a
singular event in the _linear_ "git log" output! That's what people want
to do, but I just wanted to point out that this patch is *not* carrying
around a "commit,pathname" kind of pair and it's *not* going to be able to
notice the file coming from multiple *different* files in earlier history.

IOW, if you use "git log --follow", then you get the stupid CVS/SVN kind
of "files have single identities" kind of semantics, and git log will just
pick the identity based on the normal move/copy heuristics _as_if_ the
history could be linearized.

Put another way: I think the model is broken, but given the broken model,
I think this patch does just about as well as you can do. If you have
merges with the same "file" having different filenames over the two
branches, git will just end up picking _one_ of the pathnames at the point
where the newer one goes away. It never looks at multiple pathnames in
parallel.

And if you understood all that, you probably didn't need it explained, and
if you didn't understand the above blathering, it doesn't really mtter to
you. What matters to you is that you can now do

	git log -p --follow builtin-rev-list.c

and it will find the point where the old "rev-list.c" got renamed to
"builtin-rev-list.c" and show it as such.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-22 23:37:11 -07:00
Junio C Hamano 1d848f643c tree_entry_interesting(): allow it to say "everything is interesting"
In addition to optimizing pathspecs that would never match,
which was done earlier, this optimizes pathspecs that would
always match (e.g. "arch/" while the traversal is already in
"arch/i386/" hierarchy).

This patch makes the worst case slightly more palatable, while
improving average case.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-22 00:36:00 -07:00
Junio C Hamano ccc744abbb tree-diff: avoid strncmp()
If we already know that some of the pathspecs can match later
entries in the tree we are looking at, we do not have to do more
expensive strncmp() upfront before comparing the length of the
match pattern and the path, as a path longer than the match
pattern will not match it, and a path shorter than the match
pattern will match only if the path is a directory-component
wise prefix of the match pattern.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-22 00:34:39 -07:00
Junio C Hamano 7d2f667b12 Teach tree_entry_interesting() that the tree entries are sorted.
When we are looking at a tree entry with pathspecs, if all the
pathspecs sort strictly earlier than the entry we are currently
looking at, there is no way later entries in the same tree would
match our pathspecs, because the entries are sorted.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-22 00:29:49 -07:00
Linus Torvalds 6fda5e5180 Initialize tree descriptors with a helper function rather than by hand.
This removes slightly more lines than it adds, but the real reason for
doing this is that future optimizations will require more setup of the
tree descriptor, and so we want to do it in one place.

Also renamed the "desc.buf" field to "desc.buffer" just to trigger
compiler errors for old-style manual initializations, making sure I
didn't miss anything.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-21 10:21:57 -07:00
Linus Torvalds 5d86501742 Set up for better tree diff optimizations
This is mainly just a cleanup patch, and sets up for later changes where
the tree-diff.c "interesting()" function can return more than just a
yes/no value.

In particular, it should be quite possible to say "no subsequent entries
in this tree can possibly be interesting any more", and thus allow the
callers to short-circuit the tree entirely.

In fact, changing the callers to do so is trivial, and is really all this
patch really does, because changing "interesting()" itself to say that
nothing further is going to be interesting is definitely more complicated,
considering that we may have arbitrary pathspecs.

But in cleaning up the callers, this actually fixes a potential small
performance issue in diff_tree(): if the second tree has a lot of
uninterestign crud in it, we would keep on doing the "is it interesting?"
check on the first tree for each uninteresting entry in the second one.

The answer is obviously not going to change, so that was just not helping.
The new code is clearer and simpler and avoids this issue entirely.

I also renamed "interesting()" to "tree_entry_interesting()", because I
got frustrated by the fact that

 - we actually had *another* function called "interesting()" in another
   file, and I couldn't tell from the profiles which one was the one that
   mattered more.

 - when rewriting it to return a ternary value, you can't just do

	if (interesting(...))
		...

   any more, but want to assign the return value to a local variable. The
   name of choice for that variable would normally be "interesting", so
   I just wanted to make the function name be more specific, and avoid
   that whole issue (even though I then didn't choose that name for either
   of the users, just to avoid confusion in the patch itself ;)

In other words, this doesn't really change anything, but I think it's a
good thing to do, and if somebody comes along and writes the logic for
"yeah, none of the pathspecs you have are interesting", we now support
that trivially.

It could easily be a meaningful optimization for things like "blame",
where there's just one pathspec, and stopping when you've seen it would
allow you to avoid about 50% of the tree traversals on average.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-19 02:01:39 -07:00
Junio C Hamano d54fe394ac Merge branch 'ar/diff'
* ar/diff:
  Add tests for --quiet option of diff programs
  try-to-simplify-commit: use diff-tree --quiet machinery.
  revision.c: explain what tree_difference does
  Teach --quiet to diff backends.
  diff --quiet
  Remove unused diffcore_std_no_resolve
  Allow git-diff exit with codes similar to diff(1)
2007-03-18 15:48:06 -07:00
Linus Torvalds 304de2d2d6 Avoid unnecessary strlen() calls
This is a micro-optimization that grew out of the mailing list discussion
about "strlen()" showing up in profiles.

We used to pass regular C strings around to the low-level tree walking
routines, and while this worked fine, it meant that we needed to call
strlen() on strings that the caller always actually knew the size of
anyway.

So pass the length of the string down wih the string, and avoid
unnecessary calls to strlen(). Also, when extracting a pathname from a
tree entry, use "tree_entry_len()" instead of strlen(), since the length
of the pathname is directly calculable from the decoded tree entry itself
without having to actually do another strlen().

This shaves off another ~5-10% from some loads that are very tree
intensive (notably doing commit filtering by a pathspec).

Signed-off-by: Linus Torvalds  <torvalds@linux-foundation.org>"
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-18 15:36:59 -07:00
Junio C Hamano 822cac0155 Teach --quiet to diff backends.
This teaches git-diff-files, git-diff-index and git-diff-tree
backends to exit early under --quiet option.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-14 16:21:19 -07:00