Commit graph

11 commits

Author SHA1 Message Date
Junio C Hamano b65982b608 Optimize "diff-index --cached" using cache-tree
When running "diff-index --cached" after making a change to only a small
portion of the index, there is no point unpacking unchanged subtrees into
the index recursively, only to find that all entries match anyway.  Tweak
unpack_trees() logic that is used to read in the tree object to catch the
case where the tree entry we are looking at matches the index as a whole
by looking at the cache-tree.

As an exercise, after modifying a few paths in the kernel tree, here are
a few numbers on my Athlon 64X2 3800+:

    (without patch, hot cache)
    $ /usr/bin/time git diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.07user 0.02system 0:00.09elapsed 102%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+9407minor)pagefaults 0swaps

    (with patch, hot cache)
    $ /usr/bin/time ../git.git/git-diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.02user 0.00system 0:00.02elapsed 103%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+2446minor)pagefaults 0swaps

Cold cache numbers are very impressive, but it does not matter very much
in practice:

    (without patch, cold cache)
    $ su root sh -c 'echo 3 >/proc/sys/vm/drop_caches'
    $ /usr/bin/time git diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.06user 0.17system 0:10.26elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k
    247032inputs+0outputs (1172major+8237minor)pagefaults 0swaps

    (with patch, cold cache)
    $ su root sh -c 'echo 3 >/proc/sys/vm/drop_caches'
    $ /usr/bin/time ../git.git/git-diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.02user 0.01system 0:01.01elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
    18440inputs+0outputs (79major+2369minor)pagefaults 0swaps

This of course helps "git status" as well.

    (without patch, hot cache)
    $ /usr/bin/time ../git.git/git-status >/dev/null
    0.17user 0.18system 0:00.35elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+5336outputs (0major+10970minor)pagefaults 0swaps

    (with patch, hot cache)
    $ /usr/bin/time ../git.git/git-status >/dev/null
    0.10user 0.16system 0:00.27elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+5336outputs (0major+3921minor)pagefaults 0swaps

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-25 11:35:29 -07:00
Junio C Hamano d11b8d3425 write-tree --ignore-cache-tree
This allows you to discard the cache-tree information before writing the
tree out of the index (i.e. it always recomputes the tree object names for
all the subtrees).

This is only useful as a debug option, so I did not bother documenting it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-20 11:07:07 -07:00
Junio C Hamano b9d37a5420 Move prime_cache_tree() to cache-tree.c
The interface to build cache-tree belongs there.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-20 04:16:41 -07:00
Nanako Shiraishi 7ba04d9f37 cache-tree.c: make cache_tree_find() static
This function is not used by any other file.

Signed-off-by: Nanako Shiraishi <nanako3@lavabit.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-16 08:50:27 -07:00
Junio C Hamano 45525bd022 Make error messages from cherry-pick/revert more sensible
The original "rewrite in C" did somewhat a sloppy job while
stealing code from git-write-tree.

The caller pretends as if the write_tree() function would return
an error code and being able to issue a sensible error message
itself, but write_tree() function just calls die() and never
returns an error.  Worse yet, the function claims that it was
running git-write-tree (which is no longer true after
cherry-pick stole it).

Tested-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-05 00:39:19 -08:00
Pierre Habouzit 1dffb8fa80 Small cache_tree_write refactor.
This function cannot fail, make it void. Also make write_one act on a
const char* instead of a char*.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-26 02:27:06 -07:00
Junio C Hamano 6bd20358a9 write-tree: --prefix=<path>
The "bind" commit can express an aggregation of multiple
projects into a single commit.

In such an organization, there would be one project, root of
whose tree object is at the same level of the root of the
aggregated projects, and other projects have their toplevel in
separate subdirectories.  Let's call that root level project the
"primary project", and call other ones just "subprojects".

You would first read-tree the primary project, and then graft
the subprojects under their appropriate location using read-tree
--prefix=<subdir>/ repeatedly.

To write out a tree object from such an index for a subproject,
write-tree --prefix=<subdir>/ is used.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-01 22:29:16 -07:00
Junio C Hamano 2956dd3bd7 cache_tree_update: give an option to update cache-tree only.
When the extra "dryrun" parameter is true, cache_tree_update()
recomputes the invalid entry but does not actually creates
new tree object.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-27 16:21:54 -07:00
Junio C Hamano 7927a55d5b read-tree: teach 1-way merege and plain read to prime cache-tree.
This teaches read-tree to fully populate valid cache-tree when
reading a tree from scratch, or reading a single tree into an
existing index, reusing only the cached stat information (i.e.
one-way merge).  We have already taught update-index about cache-tree,
so "git checkout" followed by updates to a few path followed by
a "git commit" would become very efficient.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-27 01:33:07 -07:00
Junio C Hamano bad68ec924 index: make the index file format extensible.
... and move the cache-tree data into it.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-24 21:24:13 -07:00
Junio C Hamano 749864627c Add cache-tree.
The cache_tree data structure is to cache tree object names that
would result from the current index file.

The idea is to have an optional file to record each tree object
name that corresponds to a directory path in the cache when we
run write_cache(), and read it back when we run read_cache().
During various index manupulations, we selectively invalidate
the parts so that the next write-tree can bypass regenerating
tree objects for unchanged parts of the directory hierarchy.

We could perhaps make the cache-tree data an optional part of
the index file, but that would involve the index format updates,
so unless we need it for performance reasons, the current plan
is to use a separate file, $GIT_DIR/index.aux to store this
information and link it with the index file with the checksum
that is already used for index file integrity check.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-23 20:18:16 -07:00