Commit graph

90 commits

Author SHA1 Message Date
Junio C Hamano d539de9f25 Merge branch 'jl/diff-submodule-ignore'
* jl/diff-submodule-ignore:
  Teach diff --submodule that modified submodule directory is dirty
  git diff: Don't test submodule dirtiness with --ignore-submodules
  Make ce_uptodate() trustworthy again
2010-01-26 22:53:13 -08:00
Jens Lehmann 4d34477f4c git diff: Don't test submodule dirtiness with --ignore-submodules
The diff family suppresses the output of submodule changes when
requested but checks them nonetheless. But since recently submodules
get examined for their dirtiness, which is rather expensive. There is
no need to do that when the --ignore-submodules option is used, as
the gathered information is never used anyway.

Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-24 21:03:23 -08:00
Junio C Hamano 026680f881 Merge branch 'jc/fix-tree-walk'
* jc/fix-tree-walk:
  read-tree --debug-unpack
  unpack-trees.c: look ahead in the index
  unpack-trees.c: prepare for looking ahead in the index
  Aggressive three-way merge: fix D/F case
  traverse_trees(): handle D/F conflict case sanely
  more D/F conflict tests
  tests: move convenience regexp to match object names to test-lib.sh

Conflicts:
	builtin-read-tree.c
	unpack-trees.c
	unpack-trees.h
2010-01-24 17:35:58 -08:00
Junio C Hamano 125fd98434 Make ce_uptodate() trustworthy again
The rule has always been that a cache entry that is ce_uptodate(ce)
means that we already have checked the work tree entity and we know
there is no change in the work tree compared to the index, and nobody
should have to double check.  Note that false ce_uptodate(ce) does not
mean it is known to be dirty---it only means we don't know if it is
clean.

There are a few codepaths (refresh-index and preload-index are among
them) that mark a cache entry as up-to-date based solely on the return
value from ie_match_stat(); this function uses lstat() to see if the
work tree entity has been touched, and for a submodule entry, if its
HEAD points at the same commit as the commit recorded in the index of
the superproject (a submodule that is not even cloned is considered
clean).

A submodule is no longer considered unmodified merely because its HEAD
matches the index of the superproject these days, in order to prevent
people from forgetting to commit in the submodule and updating the
superproject index with the new submodule commit, before commiting the
state in the superproject.  However, the patch to do so didn't update
the codepath that marks cache entries up-to-date based on the updated
definition and instead worked it around by saying "we don't trust the
return value of ce_uptodate() for submodules."

This makes ce_uptodate() trustworthy again by not marking submodule
entries up-to-date.

The next step _could_ be to introduce a few "in-core" flag bits to
cache_entry structure to record "this entry is _known_ to be dirty",
call is_submodule_modified() from ie_match_stat(), and use these new
bits to avoid running this rather expensive check more than once, but
that can be a separate patch.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-24 00:15:29 -08:00
Jens Lehmann e3d42c4773 Performance optimization for detection of modified submodules
In the worst case is_submodule_modified() got called three times for
each submodule. The information we got from scanning the whole
submodule tree the first time can be reused instead.

New parameters have been added to diff_change() and diff_addremove(),
the information is stored in a new member of struct diff_filespec. Its
value is then reused instead of calling is_submodule_modified() again.

When no explicit "-dirty" is needed in the output the call to
is_submodule_modified() is not necessary when the submodules HEAD
already disagrees with the ref of the superproject, as this alone
marks it as modified. To achieve that, get_stat_data() got an extra
argument.

Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-18 17:28:21 -08:00
Jens Lehmann ee6fc514f2 Show submodules as modified when they contain a dirty work tree
Until now a submodule only then showed up as modified in the supermodule
when the last commit in the submodule differed from the one in the index
or the diffed against commit of the superproject. A dirty work tree
containing new untracked or modified files in a submodule was
undetectable when looking at it from the superproject.

Now git status and git diff (against the work tree) in the superproject
will also display submodules as modified when they contain untracked or
modified files, even if the compared ref matches the HEAD of the
submodule.

Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Nanako Shiraishi <nanako3@lavabit.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-16 16:40:50 -08:00
Junio C Hamano 73d66323ac Merge branch 'nd/sparse'
* nd/sparse: (25 commits)
  t7002: test for not using external grep on skip-worktree paths
  t7002: set test prerequisite "external-grep" if supported
  grep: do not do external grep on skip-worktree entries
  commit: correctly respect skip-worktree bit
  ie_match_stat(): do not ignore skip-worktree bit with CE_MATCH_IGNORE_VALID
  tests: rename duplicate t1009
  sparse checkout: inhibit empty worktree
  Add tests for sparse checkout
  read-tree: add --no-sparse-checkout to disable sparse checkout support
  unpack-trees(): ignore worktree check outside checkout area
  unpack_trees(): apply $GIT_DIR/info/sparse-checkout to the final index
  unpack-trees(): "enable" sparse checkout and load $GIT_DIR/info/sparse-checkout
  unpack-trees.c: generalize verify_* functions
  unpack-trees(): add CE_WT_REMOVE to remove on worktree alone
  Introduce "sparse checkout"
  dir.c: export excluded_1() and add_excludes_from_file_1()
  excluded_1(): support exclude files in index
  unpack-trees(): carry skip-worktree bit over in merged_entry()
  Read .gitignore from index if it is skip-worktree
  Avoid writing to buffer in add_excludes_from_file_1()
  ...

Conflicts:
	.gitignore
	Documentation/config.txt
	Documentation/git-update-index.txt
	Makefile
	entry.c
	t/t7002-grep.sh
2010-01-13 11:58:34 -08:00
Junio C Hamano 730f72840c unpack-trees.c: look ahead in the index
This makes the traversal of index be in sync with the tree traversal.
When unpack_callback() is fed a set of tree entries from trees, it
inspects the name of the entry and checks if the an index entry with
the same name could be hiding behind the current index entry, and

 (1) if the name appears in the index as a leaf node, it is also
     fed to the n_way_merge() callback function;

 (2) if the name is a directory in the index, i.e. there are entries in
     that are underneath it, then nothing is fed to the n_way_merge()
     callback function;

 (3) otherwise, if the name comes before the first eligible entry in the
     index, the index entry is first unpacked alone.

When traverse_trees_recursive() descends into a subdirectory, the
cache_bottom pointer is moved to walk index entries within that directory.

All of these are omitted for diff-index, which does not even want to be
fed an index entry and a tree entry with D/F conflicts.

This fixes 3-way read-tree and exposes a bug in other parts of the system
in t6035, test #5.  The test prepares these three trees:

 O = HEAD^
    100644 blob e69de29bb2    a/b-2/c/d
    100644 blob e69de29bb2    a/b/c/d
    100644 blob e69de29bb2    a/x

 A = HEAD
    100644 blob e69de29bb2    a/b-2/c/d
    100644 blob e69de29bb2    a/b/c/d
    100644 blob 587be6b4c3f93f93c489c0111bba5596147a26cb    a/x

 B = master
    120000 blob a36b77384451ea1de7bd340ffca868249626bc52    a/b
    100644 blob e69de29bb2    a/b-2/c/d
    100644 blob e69de29bb2    a/x

With a clean index that matches HEAD, running

    git read-tree -m -u --aggressive $O $A $B

now yields

    120000 a36b77384451ea1de7bd340ffca868249626bc52 3       a/b
    100644 e69de29bb2 0       a/b-2/c/d
    100644 e69de29bb2 1       a/b/c/d
    100644 e69de29bb2 2       a/b/c/d
    100644 587be6b4c3f93f93c489c0111bba5596147a26cb 0       a/x

which is correct.  "master" created "a/b" symlink that did not exist,
and removed "a/b/c/d" while HEAD did not do touch either path.

Before this series, read-tree did not notice the situation and resolved
addition of "a/b" and removal of "a/b/c/d" independently.  If A = HEAD had
another path "a/b/c/e" added, this merge should conflict but instead it
silently resolved "a/b" and then immediately overwrote it to add
"a/b/c/e", which was quite bogus.

Tests in t1012 start to work with this.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-07 15:00:14 -08:00
Junio C Hamano da165f470e unpack-trees.c: prepare for looking ahead in the index
This prepares but does not yet implement a look-ahead in the index entries
when traverse-trees.c decides to give us tree entries in an order that
does not match what is in the index.

A case where a look-ahead in the index is necessary happens when merging
branch B into branch A while the index matches the current branch A, using
a tree O as their common ancestor, and these three trees looks like this:

   O        A       B
   t                t
   t-i      t-i     t-i
   t-j      t-j
            t/1
            t/2

The traverse_trees() function gets "t", "t-i" and "t" from trees O, A and
B first, and notices that A may have a matching "t" behind "t-i" and "t-j"
(indeed it does), and tells A to give that entry instead.  After unpacking
blob "t" from tree B (as it hasn't changed since O in B and A removed it,
it will result in its removal), it descends into directory "t/".

The side that walked index in parallel to the tree traversal used to be
implemented with one pointer, o->pos, that points at the next index entry
to be processed.  When this happens, the pointer o->pos still points at
"t-i" that is the first entry.  We should be able to skip "t-i" and "t-j"
and locate "t/1" from the index while the recursive invocation of
traverse_trees() walks and match entries found there, and later come back
to process "t-i".

While that look-ahead is not implemented yet, this adds a flag bit,
CE_UNPACKED, to mark the entries in the index that has already been
processed.  o->pos pointer has been renamed to o->cache_bottom and it
points at the first entry that may still need to be processed.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-07 14:59:54 -08:00
Junio C Hamano 3cc3fb7df6 Merge branch 'jc/1.7.0-diff-whitespace-only-status'
* jc/1.7.0-diff-whitespace-only-status:
  diff.c: fix typoes in comments
  Make test case number unique
  diff: Rename QUIET internal option to QUICK
  diff: change semantics of "ignore whitespace" options

Conflicts:
	diff.h
2009-12-26 14:03:18 -08:00
Junio C Hamano da8ba5e7da diff-lib.c: fix misleading comments on oneway_diff()
20a16eb (unpack_trees(): fix diff-index regression., 2008-03-10) adjusted
diff-index to the new world order since 34110cd (Make 'unpack_trees()'
have a separate source and destination index, 2008-03-06).  Callbacks are
expected to return anything non-negative as "success", and instead of
reporting how many index entries they have processed, they are expected to
advance o->pos themselves.  The code did so, but a stale comment was left
behind.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-11 16:40:43 -07:00
Junio C Hamano 433233e0b6 Merge branch 'jc/shortstatus'
* jc/shortstatus:
  git commit --dry-run -v: show diff in color when asked
  Documentation/git-commit.txt: describe --dry-run
  wt-status: collect untracked files in a separate "collect" phase
  Make git_status_config() file scope static to builtin-commit.c
  wt-status: move wt_status_colors[] into wt_status structure
  wt-status: move many global settings to wt_status structure
  commit: --dry-run
  status: show worktree status of conflicted paths separately
  wt-status.c: rework the way changes to the index and work tree are summarized
  diff-index: keep the original index intact
  diff-index: report unmerged new entries
2009-08-28 19:38:19 -07:00
Nguyễn Thái Ngọc Duy b4d1690df1 Teach Git to respect skip-worktree bit (reading part)
grep: turn on --cached for files that is marked skip-worktree
ls-files: do not check for deleted file that is marked skip-worktree
update-index: ignore update request if it's skip-worktree, while still allows removing
diff*: skip worktree version

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-23 17:13:32 -07:00
Nguyễn Thái Ngọc Duy 540e694b13 Prevent diff machinery from examining assume-unchanged entries on worktree
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-11 23:17:55 -07:00
Junio C Hamano 26da1d7867 diff-index: keep the original index intact
When comparing the index and a tree, we used to read the contents of the
tree into stage #1 of the index and compared them with stage #0.  In order
not to lose sight of entries originally unmerged in the index, we hoisted
them to stage #3 before reading the tree.

Commit d1f2d7e (Make run_diff_index() use unpack_trees(), not read_tree(),
2008-01-19) changed all this.  These days, we instead use unpack_trees()
API to traverse the tree and compare the contents with the index, without
modifying the index at all.  There is no reason to hoist the unmerged
entries to stage #3 anymore.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-05 02:21:11 -07:00
Junio C Hamano 29796c6ccf diff-index: report unmerged new entries
Since an earlier change to diff-index by d1f2d7e (Make run_diff_index()
use unpack_trees(), not read_tree(), 2008-01-19), we stopped reporting an
unmerged path that does not exist in the tree, but we should.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-05 02:21:11 -07:00
Junio C Hamano 90b1994170 diff: Rename QUIET internal option to QUICK
The option "QUIET" primarily meant "find if we have _any_ difference as
quick as possible and report", which means we often do not even have to
look at blobs if we know the trees are different by looking at the higher
level (e.g. "diff-tree A B").  As a side effect, because there is no point
showing one change that we happened to have found first, it also enables
NO_OUTPUT and EXIT_WITH_STATUS options, making the end result look quiet.

Rename the internal option to QUICK to reflect this better; it also makes
grepping the source tree much easier, as there are other kinds of QUIET
option everywhere.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-29 10:22:39 -07:00
Junio C Hamano c28a17f270 Merge branch 'jc/cache-tree'
* jc/cache-tree:
  Avoid "diff-index --cached" optimization under --find-copies-harder
  Optimize "diff-index --cached" using cache-tree
  t4007: modernize the style
  cache-tree.c::cache_tree_find(): simplify internal API
  write-tree --ignore-cache-tree
2009-06-20 21:47:30 -07:00
Junio C Hamano a0919ced8a Avoid "diff-index --cached" optimization under --find-copies-harder
When find-copies-harder is in effect, the diff frontends are expected to
feed all paths, not just changed paths, to the diffcore, so that copy
sources can be picked up.  In such a case, not descending into subtrees
using the cache-tree information is simply wrong.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-25 11:35:29 -07:00
Junio C Hamano b65982b608 Optimize "diff-index --cached" using cache-tree
When running "diff-index --cached" after making a change to only a small
portion of the index, there is no point unpacking unchanged subtrees into
the index recursively, only to find that all entries match anyway.  Tweak
unpack_trees() logic that is used to read in the tree object to catch the
case where the tree entry we are looking at matches the index as a whole
by looking at the cache-tree.

As an exercise, after modifying a few paths in the kernel tree, here are
a few numbers on my Athlon 64X2 3800+:

    (without patch, hot cache)
    $ /usr/bin/time git diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.07user 0.02system 0:00.09elapsed 102%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+9407minor)pagefaults 0swaps

    (with patch, hot cache)
    $ /usr/bin/time ../git.git/git-diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.02user 0.00system 0:00.02elapsed 103%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+2446minor)pagefaults 0swaps

Cold cache numbers are very impressive, but it does not matter very much
in practice:

    (without patch, cold cache)
    $ su root sh -c 'echo 3 >/proc/sys/vm/drop_caches'
    $ /usr/bin/time git diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.06user 0.17system 0:10.26elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k
    247032inputs+0outputs (1172major+8237minor)pagefaults 0swaps

    (with patch, cold cache)
    $ su root sh -c 'echo 3 >/proc/sys/vm/drop_caches'
    $ /usr/bin/time ../git.git/git-diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.02user 0.01system 0:01.01elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
    18440inputs+0outputs (79major+2369minor)pagefaults 0swaps

This of course helps "git status" as well.

    (without patch, hot cache)
    $ /usr/bin/time ../git.git/git-status >/dev/null
    0.17user 0.18system 0:00.35elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+5336outputs (0major+10970minor)pagefaults 0swaps

    (with patch, hot cache)
    $ /usr/bin/time ../git.git/git-status >/dev/null
    0.10user 0.16system 0:00.27elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+5336outputs (0major+3921minor)pagefaults 0swaps

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-25 11:35:29 -07:00
Junio C Hamano 3ed24211d4 Merge branch 'lt/maint-diff-reduce-lstat'
* lt/maint-diff-reduce-lstat:
  Teach 'git checkout' to preload the index contents
  Avoid unnecessary 'lstat()' calls in 'get_stat_data()'
2009-05-23 01:40:33 -07:00
Linus Torvalds 658dd48c85 Avoid unnecessary 'lstat()' calls in 'get_stat_data()'
When we ask get_stat_data() to get the mode and size of an index entry,
we can avoid the lstat() call if we have marked the index entry as being
uptodate due to earlier lstat() calls.

This avoids a lot of unnecessary lstat() calls in eg 'git checkout',
where the last phase shows the differences to the working tree
(requiring a diff), but earlier phases have already verified the index.

On the kernel repo (with a fast machine and everything cached), this
changes timings of a nul 'git checkout' from

 - Before (best of ten):

	0.14user 0.05system 0:00.19elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
	0inputs+0outputs (0major+13237minor)pagefaults 0swaps

 - After
	0.11user 0.03system 0:00.15elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
	0inputs+0outputs (0major+13235minor)pagefaults 0swaps

so it can obviously be noticeable, although equally obviously it's not a
show-stopper on this particular machine. The difference is likely larger
on slower machines, or with operating systems that don't do as good a job
of name caching.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-09 20:42:19 -07:00
Junio C Hamano a9bfe81309 Merge branch 'kb/checkout-optim'
* kb/checkout-optim:
  Revert "lstat_cache(): print a warning if doing ping-pong between cache types"
  checkout bugfix: use stat.mtime instead of stat.ctime in two places
  Makefile: Set compiler switch for USE_NSEC
  Create USE_ST_TIMESPEC and turn it on for Darwin
  Not all systems use st_[cm]tim field for ns resolution file timestamp
  Record ns-timestamps if possible, but do not use it without USE_NSEC
  write_index(): update index_state->timestamp after flushing to disk
  verify_uptodate(): add ce_uptodate(ce) test
  make USE_NSEC work as expected
  fix compile error when USE_NSEC is defined
  check_updates(): effective removal of cache entries marked CE_REMOVE
  lstat_cache(): print a warning if doing ping-pong between cache types
  show_patch_diff(): remove a call to fstat()
  write_entry(): use fstat() instead of lstat() when file is open
  write_entry(): cleanup of some duplicated code
  create_directories(): remove some memcpy() and strchr() calls
  unlink_entry(): introduce schedule_dir_for_removal()
  lstat_cache(): swap func(length, string) into func(string, length)
  lstat_cache(): generalise longest_match_lstat_cache()
  lstat_cache(): small cleanup and optimisation
2009-03-17 18:54:31 -07:00
Stephan Beyer 75f3ff2eea Generalize and libify index_is_dirty() to index_differs_from(...)
index_is_dirty() in builtin-revert.c checks if the index is dirty.
This patch generalizes this function to check if the index differs
from a revision, i.e. the former index_is_dirty() behavior can now be
achieved by index_differs_from("HEAD", 0).

The second argument "diff_flags" allows to set further diff option
flags like DIFF_OPT_IGNORE_SUBMODULES. See DIFF_OPT_* macros in diff.h
for a list.

index_differs_from() seems to be useful for more than builtin-revert.c,
so it is moved into diff-lib.c and also used in builtin-commit.c.

Yet to mention:

 - "rev.abbrev = 0;" can be safely removed.
   This has no impact on performance or functioning of neither
   setup_revisions() nor run_diff_index().

 - rev.pending.objects is free()d because this fixes a leak.
   (Also see 295dd2ad "Fix memory leak in traverse_commit_list")

Mentored-by: Daniel Barkalow <barkalow@iabervon.org>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Stephan Beyer <s-beyer@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-10 22:25:39 -08:00
Kjetil Barvik 571998921d lstat_cache(): swap func(length, string) into func(string, length)
Swap function argument pair (length, string) into (string, length) to
conform with the commonly used order inside the GIT source code.

Also, add a note about this fact into the coding guidelines.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-09 20:59:26 -08:00
Kjetil Barvik ff7e6aad6d Cleanup of unused symcache variable inside diff-lib.c
Commit c40641b77b, 'Optimize
symlink/directory detection' by Linus Torvalds, removed the 'char
*symcache' parameter to the has_symlink_leading_path() function.  This
made all variables currently named 'symcache' inside diff-lib.c
unnecessary.

This also let us throw away the 'struct oneway_unpack_data', and
instead directly use the 'struct rev_info *revs' member, which
was the only member left after removal of the 'symcache[] array'
member.  The 'struct oneway_unpack_data' was introduced by the
following commit:

  948dd346  "diff-files: careful when inspecting work tree items"

Impact: cleanup
        PATH_MAX bytes less memory stack usage in some cases

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-11 15:56:55 -08:00
Junio C Hamano a5a818ee48 diff: vary default prefix depending on what are compared
With a new configuration "diff.mnemonicprefix", "git diff" shows the
differences between various combinations of preimage and postimage trees
with prefixes different from the standard "a/" and "b/".  Hopefully this
will make the distinction stand out for some people.

    "git diff" compares the (i)ndex and the (w)ork tree;
    "git diff HEAD" compares a (c)ommit and the (w)ork tree;
    "git diff --cached" compares a (c)ommit and the (i)ndex;
    "git-diff HEAD:file1 file2" compares an (o)bject and a (w)ork tree entity;
    "git diff --no-index a b" compares two non-git things (1) and (2).

Because these mnemonics now have meanings, they are swapped when reverse
diff is in effect and this feature is enabled.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-30 20:53:24 -07:00
Dmitry Potapov fd55a19eb1 Fix buffer overflow in git diff
If PATH_MAX on your system is smaller than a path stored, it may cause
buffer overflow and stack corruption in diff_addremove() and diff_change()
functions when running git-diff

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-16 14:03:24 -07:00
Junio C Hamano 0569e9b8ce "git diff": do not ignore index without --no-index
Even if "foo" and/or "bar" does not exist in index, "git diff foo bar"
should not change behaviour drastically from "git diff foo bar baz" or
"git diff foo".  A feature that "sometimes works and is handy" is an
unreliable cute hack.

"git diff foo bar" outside a git repository continues to work as a more
colourful alternative to "diff -u" as before.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-24 00:16:39 -07:00
Linus Torvalds c40641b77b Optimize symlink/directory detection
This is the base for making symlink detection in the middle fo a pathname
saner and (much) more efficient.

Under various loads, we want to verify that the full path leading up to a
filename is a real directory tree, and that when we successfully do an
'lstat()' on a filename, we don't get a false positive due to a symlink in
the middle of the path that git should have seen as a symlink, not as a
normal path component.

The 'has_symlink_leading_path()' function already did this, and cached
a single level of symlink information, but didn't cache the _lack_ of a
symlink, so the normal behaviour was actually the wrong way around, and we
ended up doing an 'lstat()' on each path component to check that it was a
real directory.

This caches the last detected full directory and symlink entries, and
speeds up especially deep directory structures a lot by avoiding to
lstat() all the directories leading up to each entry in the index.

[ This can - and should - probably be extended upon so that we eventually
  never do a bare 'lstat()' on any path entries at *all* when checking the
  index, but always check the full path carefully. Right now we do not
  generally check the whole path for all our normal quick index
  revalidation.

  We should also make sure that we're careful about all the invalidation,
  ie when we remove a link and replace it by a directory we should
  invalidate the symlink cache if it matches (and vice versa for the
  directory cache).

  But regardless, the basic function needs to be sane to do that. The old
  'has_symlink_leading_path()' was not capable enough - or indeed the code
  readable enough - to really do that sanely. So I'm pushing this as not
  just an optimization, but as a base for further work. ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-10 18:16:31 -07:00
Junio C Hamano 2855e70ad1 Merge branch 'py/diff-submodule'
* py/diff-submodule:
  is_racy_timestamp(): do not check timestamp for gitlinks
  diff-lib.c: rename check_work_tree_entity()
  diff: a submodule not checked out is not modified
  Add t7506 to test submodule related functions for git-status
  t4027: test diff for submodule with empty directory
2008-05-10 18:16:25 -07:00
Junio C Hamano 867fa20fe9 Merge branch 'jc/lstat'
* jc/lstat:
  diff-files: mark an index entry we know is up-to-date as such
  write_index(): optimize ce_smudge_racily_clean_entry() calls with CE_UPTODATE
2008-05-05 19:16:26 -07:00
Junio C Hamano 451244d724 diff-lib.c: rename check_work_tree_entity()
The function is about checking for removed work tree item, so name it
accordingly to avoid future confusion.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-04 17:41:27 -07:00
Junio C Hamano 1392a37721 diff: a submodule not checked out is not modified
948dd34 (diff-index: careful when inspecting work tree items, 2008-03-30)
made the work tree check careful not to be fooled by a new directory that
exists at a place the index expects a blob.  For such a change to be a
typechange from blob to submodule, the new directory has to be a
repository.

However, if the index expects a submodule there, we should not insist the
work tree entity to be a repository --- a simple directory that is not a
full fledged repository (even an empty directory would do) should be
considered an unmodified subproject, because that is how a superproject
with a submodule is checked out sparsely by default.

This makes the function check_work_tree_entity() even more careful not to
report a submodule that is not checked out as removed.  It fixes the
recently added test in t4027.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-04 17:41:27 -07:00
Matthieu Moy 59b0c24daa git-svn: detect and fail gracefully when dcommitting to a void
The command

  git svn clone (URL of an empty SVN repo here)

works, creates an empty git repository. I can perform the initial
commit there, but then, "git svn dcommit" says :

Use of uninitialized value in concatenation (.) or string at .../git-svn line 414.
Committing to  ...
Unable to determine upstream SVN information from HEAD history

I guess a correct management of the initial commit in git-svn would be
hard to implement, but at least, the error message can be improved.
First step is something like the patch below, and better would be for
"git svn clone" to warn that it won't be able to do much with the
cloned repo.

Acked-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-04-29 23:11:57 -07:00
Junio C Hamano 8fa29602d4 diff-files: mark an index entry we know is up-to-date as such
This does not make any difference when running diff-files alone, but if
you internally run run_diff_files() and then run other operations further
on the index, we do not have to run lstat(2) again on entries we already
have checked.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-04-12 19:42:17 -07:00
Junio C Hamano f58dbf23c3 diff-files: careful when inspecting work tree items
This fixes the same breakage in diff-files.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-30 22:22:09 -07:00
Junio C Hamano 948dd346fd diff-index: careful when inspecting work tree items
Earlier, if you changed a staged path into a directory in the work tree,
we happily ran lstat(2) on it and found that it exists, and declared that
the user changed it to a gitlink.

This is wrong for two reasons:

 (1) It may be a directory, but it may not be a submodule, and in the
     latter case, the change we need to report is "the blob at the path
     has disappeared".  We need to check with resolve_gitlink_ref() to be
     consistent with what "git add" and "git update-index --add" does.

 (2) lstat(2) may have succeeded only because a leading component of the
     path was turned into a symbolic link that points at something that
     exists in the work tree.  In such a case, the path itself does not
     exist anymore, as far as the index is concerned.

This fixes these breakages in diff-index that the previous patch has
exposed.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-30 22:22:09 -07:00
Linus Torvalds 20a16eb33e unpack_trees(): fix diff-index regression.
When skip_unmerged option is not given, unpack_trees() should not just
skip unmerged cache entries but keep them in the result for the caller to
sort them out.

For callers other than diff-index, the incoming index should never be
unmerged, but diff-index is a special case caller.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-10 23:51:13 -07:00
Linus Torvalds 34110cd4e3 Make 'unpack_trees()' have a separate source and destination index
We will always unpack into our own internal index, but we will take the
source from wherever specified, and we will optionally write the result
to a specified index (optionally, because not everybody even _wants_ any
result: the index diffing really wants to just walk the tree and index
in parallel).

This ends up removing a fair number more lines than it adds, for the
simple reason that we can now skip all the crud that tried to be
oh-so-careful about maintaining our position in the index as we were
traversing and modifying it.  Since we don't actually modify the source
index any more, we can just update the 'o->pos' pointer without worrying
about whether an index entry got removed or replaced or added to.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-09 01:03:38 -08:00
Linus Torvalds bc052d7f43 Make 'unpack_trees()' take the index to work on as an argument
This is just a very mechanical conversion, and makes everybody set it to
'&the_index' before calling, but at least it makes it more explicit
where we work with the index.

The next stage would be to split that index usage up into a 'source' and
a 'destination' index, so that we can unpack into a different index than
we started out from.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-09 00:43:48 -08:00
Junio C Hamano c8c16f2865 diff-lib.c: constness strengthening
The internal implementation of diff-index codepath used to use non const
pointer to pass sha1 around, but it did not have to.  With this, we can
also lose the private no_sha1[] array, as we can use the public null_sha1[]
array that exists exactly for the same purpose.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-02 01:00:30 -08:00
Daniel Barkalow 203a2fe117 Allow callers of unpack_trees() to handle failure
Return an error from unpack_trees() instead of calling die(), and exit
with an error in read-tree, builtin-commit, and diff-lib. merge-recursive
already expected an error return from unpack_trees, so it doesn't need to
be changed. The merge function can return negative to abort.

This will be used in builtin-checkout -m.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
2008-02-09 23:16:51 -08:00
Johannes Schindelin 204ce979a5 Also use unpack_trees() in do_diff_cache()
As in run_diff_index(), we call unpack_trees() with the oneway_diff()
function in do_diff_cache() now.  This makes the function diff_cache()
obsolete.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-21 13:09:07 -08:00
Linus Torvalds d1f2d7e8ca Make run_diff_index() use unpack_trees(), not read_tree()
A plain "git commit" would still run lstat() a lot more than necessary,
because wt_status_print() would cause the index to be repeatedly flushed
and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit
to be lost, resulting in the files in the index being lstat'ed three
times each.

The reason why wt-status.c ended up invalidating and re-reading the
cache multiple times was that it uses "run_diff_index()", which in turn
uses "read_tree()" to populate the index with *both* the old index and
the tree we want to compare against.

So this patch re-writes run_diff_index() to not use read_tree(), but
instead use "unpack_trees()" to diff the index to a tree.  That, in
turn, means that we don't need to modify the index itself, which then
means that we don't need to invalidate it and re-read it!

This, together with the lstat() optimizations, means that "git commit"
on the kernel tree really only needs to lstat() the index entries once.
That noticeably cuts down on the cached timings.

Best time before:

	[torvalds@woody linux]$ time git commit > /dev/null
	real    0m0.399s
	user    0m0.232s
	sys     0m0.164s

Best time after:

	[torvalds@woody linux]$ time git commit > /dev/null
	real    0m0.254s
	user    0m0.140s
	sys     0m0.112s

so it's a noticeable improvement in addition to being a nice conceptual
cleanup (it's really not that pretty that "run_diff_index()" dirties the
index!)

Doing an "strace -c" on it also shows that as it cuts the number of
lstat() calls by two thirds, it goes from being lstat()-limited to being
limited by getdents() (which is the readdir system call):

Before:
	% time     seconds  usecs/call     calls    errors syscall
	------ ----------- ----------- --------- --------- ----------------
	 60.69    0.000704           0     69230        31 lstat
	 23.62    0.000274           0      5522           getdents
	  8.36    0.000097           0      5508      2638 open
	  2.59    0.000030           0      2869           close
	  2.50    0.000029           0       274           write
	  1.47    0.000017           0      2844           fstat

After:
	% time     seconds  usecs/call     calls    errors syscall
	------ ----------- ----------- --------- --------- ----------------
	 45.17    0.000276           0      5522           getdents
	 26.51    0.000162           0     23112        31 lstat
	 19.80    0.000121           0      5503      2638 open
	  4.91    0.000030           0      2864           close
	  1.48    0.000020           0       274           write
	  1.34    0.000018           0      2844           fstat
	...

It passes the test-suite for me, but this is another of one of those
really core functions, and certainly pretty subtle, so..

NOTE! The Linux lstat() system call is really quite cheap when everything
is cached, so the fact that this is quite noticeable on Linux is likely to
mean that it is *much* more noticeable on other operating systems. I bet
you'll see a much bigger performance improvement from this on Windows in
particular.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-21 13:05:27 -08:00
Linus Torvalds 7a51ed66f6 Make on-disk index representation separate from in-core one
This converts the index explicitly on read and write to its on-disk
format, allowing the in-core format to contain more flags, and be
simpler.

In particular, the in-core format is now host-endian (as opposed to the
on-disk one that is network endian in order to be able to be shared
across machines) and as a result we can dispense with all the
htonl/ntohl on accesses to the cache_entry fields.

This will make it easier to make use of various temporary flags that do
not exist in the on-disk format.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-21 12:44:31 -08:00
Steffen Prohaska ecf4831d89 Use is_absolute_path() in diff-lib.c, lockfile.c, setup.c, trace.c
Using the helper function to test for absolute paths makes porting easier.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-26 12:32:05 -08:00
Junio C Hamano e6cb314c08 Merge branch 'ph/diffopts'
* ph/diffopts:
  Reorder diff_opt_parse options more logically per topics.
  Make the diff_options bitfields be an unsigned with explicit masks.
  Use OPT_BIT in builtin-pack-refs
  Use OPT_BIT in builtin-for-each-ref
  Use OPT_SET_INT and OPT_BIT in builtin-branch
  parse-options new features.
2007-11-18 15:50:16 -08:00
Pierre Habouzit 8f67f8aefb Make the diff_options bitfields be an unsigned with explicit masks.
reverse_diff was a bit-value in disguise, it's merged in the flags now.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-11 16:54:15 -08:00
Junio C Hamano fb63d7f889 git-add: make the entry stat-clean after re-adding the same contents
Earlier in commit 0781b8a9b2
(add_file_to_index: skip rehashing if the cached stat already
matches), add_file_to_index() were taught not to re-add the path
if it already matches the index.

The change meant well, but was not executed quite right.  It
used ie_modified() to see if the file on the work tree is really
different from the index, and skipped adding the contents if the
function says "not modified".

This was wrong.  There are three possible comparison results
between the index and the file in the work tree:

 - with lstat(2) we _know_ they are different.  E.g. if the
   length or the owner in the cached stat information is
   different from the length we just obtained from lstat(2), we
   can tell the file is modified without looking at the actual
   contents.

 - with lstat(2) we _know_ they are the same.  The same length,
   the same owner, the same everything (but this has a twist, as
   described below).

 - we cannot tell from lstat(2) information alone and need to go
   to the filesystem to actually compare.

The last case arises from what we call 'racy git' situation,
that can be caused with this sequence:

    $ echo hello >file
    $ git add file
    $ echo aeiou >file ;# the same length

If the second "echo" is done within the same filesystem
timestamp granularity as the first "echo", then the timestamp
recorded by "git add" and the timestamp we get from lstat(2)
will be the same, and we can mistakenly say the file is not
modified.  The path is called 'racily clean'.  We need to
reliably detect racily clean paths are in fact modified.

To solve this problem, when we write out the index, we mark the
index entry that has the same timestamp as the index file itself
(that is the time from the point of view of the filesystem) to
tell any later code that does the lstat(2) comparison not to
trust the cached stat info, and ie_modified() then actually goes
to the filesystem to compare the contents for such a path.

That's all good, but it should not be used for this "git add"
optimization, as the goal of "git add" is to actually update the
path in the index and make it stat-clean.  With the false
optimization, we did _not_ cause any data loss (after all, what
we failed to do was only to update the cached stat information),
but it made the following sequence leave the file stat dirty:

    $ echo hello >file
    $ git add file
    $ echo hello >file ;# the same contents
    $ git add file

The solution is not to use ie_modified() which goes to the
filesystem to see if it is really clean, but instead use
ie_match_stat() with "assume racily clean paths are dirty"
option, to force re-adding of such a path.

There was another problem with "git add -u".  The codepath
shares the same issue when adding the paths that are found to be
modified, but in addition, it asked "git diff-files" machinery
run_diff_files() function (which is "git diff-files") to list
the paths that are modified.  But "git diff-files" machinery
uses the same ie_modified() call so that it does not report
racily clean _and_ actually clean paths as modified, which is
not what we want.

The patch allows the callers of run_diff_files() to pass the
same "assume racily clean paths are dirty" option, and makes
"git-add -u" codepath to use that option, to discover and re-add
racily clean _and_ actually clean paths.

We could further optimize on top of this patch to differentiate
the case where the path really needs re-adding (i.e. the content
of the racily clean entry was indeed different) and the case
where only the cached stat information needs to be refreshed
(i.e. the racily clean entry was actually clean), but I do not
think it is worth it.

This patch applies to maint and all the way up.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-10 00:37:39 -08:00