Commit graph

83 commits

Author SHA1 Message Date
Linus Torvalds 566b5c057c Optimize the three-way merge of git-read-tree
As mentioned, the three-way case *should* be as trivial as the
following. It passes all the tests, and I verified that a conflicting
merge in the 100,000 file horror-case merged correctly (with the conflict
markers) in 0.687 seconds with this, so it works, but I'm lazy and
somebody else should double-check it [jc: followed all three-way merge
codepaths and verified it removes when it should].

Without this patch, the merge took 8.355 seconds, so this patch
really does make a huge difference for merge performance with lots and
lots of files, and we're not talking percentages, we're talking
orders-of-magnitude differences!

Now "unpack_trees()" is just fast enough that we don't need to avoid it
(although it's probably still a good idea to eventually convert it to use
the traverse_trees() infrastructure some day - just to avoid having
extraneous tree traversal functions).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-10 23:02:14 -07:00
Linus Torvalds d699676dda Optimize the two-way merge of git-read-tree too
This trivially optimizes the two-way merge case of git-read-tree too,
which affects switching branches.

When you have tons and tons of files in your repository, but there are
only small differences in the branches (maybe just a couple of files
changed), the biggest cost of the branch switching was actually just the
index calculations.

This fixes it (timings for switching between the "testing" and "master"
branches in the 100,000 file testing-repo-from-hell, where the branches
only differ in one small file).

Before:
	[torvalds@woody bummer]$ time git checkout master
	real    0m9.919s
	user    0m8.461s
	sys     0m0.264s

After:
	[torvalds@woody bummer]$ time git checkout testing
	real    0m0.576s
	user    0m0.348s
	sys     0m0.228s

so it's easily an order of magnitude different.

This concludes the series. I think we could/should do the three-way merge
too (to speed up merges), but I'm lazy. Somebody else can do it.

The rule is very simple: you need to remove the old entry if:
 - you want to remove the file entirely
 - you replace it with a "merge conflict" entry (ie a non-stage-0 entry)

and you can avoid removing it if you either

 - keep the old one
 - or resolve it to a new one.

and these rules should all be valid for the three-way case too.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-10 14:00:25 -07:00
Linus Torvalds 288f072ec0 Optimize the common cases of git-read-tree
This optimizes bind_merge() and oneway_merge() to not unnecessarily
remove and re-add the old index entries when they can just get replaced
by updated ones.

This makes these operations much faster for large trees (where "large"
is in the 50,000+ file range), because we don't unnecessarily move index
entries around in the index array all the time.

Using the "bummer" tree (a test-tree with 100,000 files) we get:

Before:
	[torvalds@woody bummer]$ time git commit -m"Change one file" 50/500
	real    0m9.470s
	user    0m8.729s
	sys     0m0.476s

After:
	[torvalds@woody bummer]$ time git commit -m"Change one file" 50/500
	real    0m1.173s
	user    0m0.720s
	sys     0m0.452s

so for large trees this is easily very noticeable indeed.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-10 14:00:11 -07:00
Linus Torvalds b48d5a050a Move old index entry removal from "unpack_trees()" into the individual functions
This makes no changes to current code, but it allows the individual merge
functions to decide what to do about the old entry.  They might decide to
update it in place, rather than force them to always delete and re-add it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-10 13:59:19 -07:00
Linus Torvalds 933bf40a5c Start moving unpack-trees to "struct tree_desc"
This doesn't actually change any real code, but it changes the interface
to unpack_trees() to take an array of "struct tree_desc" entries, the same
way the tree-walk.c functions do.

The reason for this is that we would be much better off if we can do the
tree-unpacking using the generic "traverse_trees()" functionality instead
of having to the special "unpack" infrastructure.

This really is a pretty minimal diff, just to change the calling
convention. It passes all the tests, and looks sane. There were only two
users of "unpack_trees()": builtin-read-tree and merge-recursive, and I
tried to keep the changes minimal.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-10 02:30:44 -07:00
Junio C Hamano 936492d3cf unpack-trees.c: assume submodules are clean during check-out
Sven originally raised this issue:

    If you have a submodule checked out and you go back (or
    forward) to a revision of the supermodule that contains a
    different revision of the submodule and then switch to
    another revision, it will complain that the submodule is not
    uptodate, because git simply didn't update the submodule in
    the first move.

The current policy is to consider it is perfectly normal that
checked-out submodule is out-of-sync wrt the supermodule index.
At least until we introduce a superproject repository
configuration option that says "in this repository, I do care
about this submodule and at any time I move around in the
superproject, recursively check out the submodule to match", it
is a reasonable policy, as we currently do not recursively
checkout the submodules at all.  The most extreme case of this
policy is that the superproject index knows about the submodule
but the subdirectory does not even have to be checked out.

The function verify_uptodate(), called during the two-way merge
aka branch switching, is about "make sure the filesystem entity
that corresponds to this cache entry is up to date, lest we lose
the local modifications".  As we explicitly allow submodule
checkout to drift from the supermodule index entry, the check
should say "Ok, for submodules, not matching is the norm" for
now.

Later when we have the ability to mark "I care about this
submodule to be always in sync with the superproject" (thereby
implementing automatic recursive checkout and perhaps diff,
among other things), we should check if the submodule in
question is marked as such and perform the current test.

Acked-by: Lars Hjemli <hjemli@gmail.com>
Acked-by: Sven Verdoolaege <skimo@kotnet.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-05 10:55:55 -07:00
René Scharfe 1843d8d545 cleanup unpack-trees.c: shrink struct tree_entry_list
Remove the two write-only fields executable and symlink from struct
tree_entry_list.  Also replace usage of the field directory with
S_ISDIR checks on the mode field, and then remove this now obsolete
field, too.  Noticed by David Kastrup.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-07-24 17:28:10 -07:00
Sven Verdoolaege 0cf7375542 unpack-trees.c: assume submodules are clean during check-out
In particular, when moving back to a commit without a given submodule
and then moving back forward to a commit with the given submodule,
we shouldn't complain that updating would lose untracked file in
the submodule, because git currently does not checkout subprojects
during superproject check-out.

Signed-off-by: Sven Verdoolaege <skimo@kotnet.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-07-18 17:01:00 -07:00
Junio C Hamano ec0603e13c Teach read-tree 2-way merge to ignore intermediate symlinks
Earlier in 16a4c61, we taught "read-tree -m -u" not to be
confused when switching from a branch that has a path frotz/filfre
to another branch that has a symlink frotz that points at xyzzy/
directory.  The fix was incomplete in that it was still confused
when coming back (i.e. switching from a branch with frotz -> xyzzy/
to another branch with frotz/filfre).

This fix is rather expensive in that for a path that is created
we would need to see if any of the leading component of that
path exists as a symbolic link in the filesystem (in which case,
we know that path itself does not exist, and the fact we already
decided to check it out tells us that in the index we already
know that symbolic link is going away as there is no D/F
conflict).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-07-12 02:22:53 -07:00
Junio C Hamano 7df6ddf51e Merge branch 'maint-1.5.1' into maint
* maint-1.5.1:
  annotate: make it work from subdirectories.
  git-config: Correct asciidoc documentation for --int/--bool
  t1300: Add tests for git-config --bool --get
  unpack-trees.c: verify_uptodate: remove dead code
  Use PATH_MAX instead of TEMPFILE_PATH_LEN
  branch: fix segfault when resolving an invalid HEAD
2007-05-20 19:57:00 -07:00
Sven Verdoolaege 0a76f66524 unpack-trees.c: verify_uptodate: remove dead code
This code was killed by commit fcc387db9b.

Signed-off-by: Sven Verdoolaege <skimo@kotnet.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-20 14:40:41 -07:00
Junio C Hamano 16a4c6176a read-tree -m -u: avoid getting confused by intermediate symlinks.
When switching from a branch with both x86_64/boot/Makefile and
i386/boot/Makefile to another branch that has x86_64/boot as a
symlink pointing at ../i386/boot, the code incorrectly removed
i386/boot/Makefile.

This was because we first removed everything under x86_64/boot
to make room to create a symbolic link x86_64/boot, then removed
x86_64/boot/Makefile which no longer exists but now is pointing
at i386/boot/Makefile, thanks to the symlink we just created.

This fixes it by using the has_symlink_leading_path() function
introduced previously for git-apply in the checkout codepath.
Earlier, "git checkout" was broken in t4122 test due to this
bug, and the test had an extra "git reset --hard" as a
workaround, which is removed because it is not needed anymore.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-11 22:33:31 -07:00
Nicolas Pitre 55a9137d8a delay progress display when checking out files
Let's start displaying progress only if more than 50% of total number
of files remains to be checked out after 2 seconds.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-22 22:18:05 -07:00
Nicolas Pitre 13aaf14825 make progress "title" part of the common progress interface
If the progress bar ends up in a box, better provide a title for it too.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-22 22:18:05 -07:00
Nicolas Pitre 96a02f8f6d common progress display support
Instead of having this code duplicated in multiple places, let's have
a common interface for progress display.  If someday someone wishes to
display a cheezy progress bar instead then only one file will have to
be changed.

Note: I left merge-recursive.c out since it has a strange notion of
progress as it apparently increase the expected total number as it goes.
Someone with more intimate knowledge of what that is supposed to mean
might look at converting it to the common progress interface.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-22 22:18:05 -07:00
Junio C Hamano 4c4caafc9c Treat D/F conflict entry more carefully in unpack-trees.c::threeway_merge()
This fixes three buglets in threeway_merge() regarding D/F
conflict entries.

* After finishing with path D and handling path D/F, some stages
  have D/F conflict entry which are obviously non-NULL.  For the
  purpose of determining if the path D/F is missing in the
  ancestor, they should not be taken into account.

* D/F conflict entry is a marker to say "this stage does _not_
  have the path", so do not send them to keep_entry().

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10 12:55:51 -07:00
Junio C Hamano ea4b52a86f t1000: fix case table.
Case #10 is not handled with unpack-trees.c:threeway_merge()
internally, unless under the agressive rule, and it is not a
bug.  As the test expects, ND (one side did not do anything,
other side deleted) case was meant to be handled by the caller's
policy (e.g. git-merge-one-file or git-merge-recursive).

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10 12:55:51 -07:00
Junio C Hamano c81935348b Fix switching to a branch with D/F when current branch has file D.
This loosens the over-eager verify_absent() check that gets
upset to find directory D in the current working tree when
switching to a branch that has a file there.  The check needs to
make sure that we do not lose precious working tree files as a
result of removing directory D and replacing it with the file
from the other branch, which is a tad expensive but this is a
less common case.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-04 00:25:10 -07:00
Junio C Hamano b8ba1535bd Fix twoway_merge that passed d/f conflict marker to merged_entry().
When switching from one tree to another, we should not send a
marker that says "this file does not exist in the new tree -- I
am a placeholder to tell you that, and not a real blob" down to
merged_entry() as the result of the merge.
2007-04-04 00:19:29 -07:00
Junio C Hamano 9a4d8fdc25 unpack-trees: get rid of *indpos parameter.
This variable keeps track of which entry in the original index
the traversal is looking at, and belongs to the unpack_trees_options
structure along with other traversal status information.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-04 00:19:28 -07:00
Junio C Hamano 7f7932ab25 unpack_trees.c: pass unpack_trees_options structure to keep_entry() as well.
Other decision functions, deleted_entry() and merged_entry() take one as
their parameter, and this function should.  I'll be introducing a separate
index to build the result in, and am planning to pass it as the part of the
structure.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-04 00:19:28 -07:00
Linus Torvalds 6fda5e5180 Initialize tree descriptors with a helper function rather than by hand.
This removes slightly more lines than it adds, but the real reason for
doing this is that future optimizations will require more setup of the
tree descriptor, and so we want to do it in one place.

Also renamed the "desc.buf" field to "desc.buffer" just to trigger
compiler errors for old-style manual initializations, making sure I
didn't miss anything.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-21 10:21:57 -07:00
Junio C Hamano 85023577a8 simplify inclusion of system header files.
This is a mechanical clean-up of the way *.c files include
system header files.

 (1) sources under compat/, platform sha-1 implementations, and
     xdelta code are exempt from the following rules;

 (2) the first #include must be "git-compat-util.h" or one of
     our own header file that includes it first (e.g. config.h,
     builtin.h, pkt-line.h);

 (3) system headers that are included in "git-compat-util.h"
     need not be included in individual C source files.

 (4) "git-compat-util.h" does not have to include subsystem
     specific header files (e.g. expat.h).

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-20 09:51:35 -08:00
Junio C Hamano f388cec3d7 Merge branch 'jc/read-tree-ignore'
* jc/read-tree-ignore:
  read-tree: document --exclude-per-directory
  Loosen "working file will be lost" check in Porcelain-ish
  read-tree: further loosen "working file will be lost" check.
2006-12-13 11:10:24 -08:00
Junio C Hamano f8a9d42872 read-tree: further loosen "working file will be lost" check.
This follows up commit ed93b449 where we removed overcautious
"working file will be lost" check.

A new option "--exclude-per-directory=.gitignore" can be used to
tell the "git-read-tree" command that the user does not mind
losing contents in untracked files in the working tree, if they
need to be overwritten by a merge (either a two-way "switch
branches" merge, or a three-way merge).

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-05 23:25:52 -08:00
Junio C Hamano 0fb1eaa885 unpack-trees: make sure "df_conflict_entry.name" is NUL terminated.
The structure that ends with a flexible array member (or 0
length array with older GCC) "char name[FLEX_ARRAY]" is
allocated on the stack and we use it after clearing its entire
size with memset.  That does not guarantee that "name" is
properly NUL terminated as we intended on platforms with more
forgiving structure alignment requirements.

Reported breakage on m68k by Roman Zippel.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-04 14:24:28 -08:00
Junio C Hamano ed93b449c5 merge: loosen overcautious "working file will be lost" check.
The three-way merge complained unconditionally when a path that
does not exist in the index is involved in a merge when it
existed in the working tree.  If we are merging an old version
that had that path tracked, but the path is not tracked anymore,
and if we are merging that old version in, the result will be
that the path is not tracked.  In that case we should not
complain.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-10-27 17:16:39 -07:00
Shawn Pearce e702496e43 Convert memcpy(a,b,20) to hashcpy(a,b).
This abstracts away the size of the hash values when copying them
from memory location to memory location, much as the introduction
of hashcmp abstracted away hash value comparsion.

A few call sites were using char* rather than unsigned char* so
I added the cast rather than open hashcpy to be void*.  This is a
reasonable tradeoff as most call sites already use unsigned char*
and the existing hashcmp is also declared to be unsigned char*.

[jc: Splitted the patch to "master" part, to be followed by a
 patch for merge-recursive.c which is not in "master" yet.

 Fixed the cast in the latter hunk to combine-diff.c which was
 wrong in the original.

 Also converted ones left-over in combine-diff.c, diff-lib.c and
 upload-pack.c ]

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-23 13:53:10 -07:00
David Rientjes a89fccd281 Do not use memcmp(sha1_1, sha1_2, 20) with hardcoded length.
Introduces global inline:

	hashcmp(const unsigned char *sha1, const unsigned char *sha2)

Uses memcmp for comparison and returns the result based on the length of
the hash name (a future runtime decision).

Acked-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-17 14:23:53 -07:00
David Rientjes 96f1e58f52 remove unnecessary initializations
[jc: I needed to hand merge the changes to the updated codebase,
 so the result needs to be checked.]

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-15 21:22:20 -07:00
David Rientjes 6f002f984f use appropriate typedefs
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-15 16:12:09 -07:00
Johannes Schindelin 076b0adcf9 read-tree: move merge functions to the library
This will allow merge-recursive to use the read-tree functionality
without exec()ing git-read-tree.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-30 23:31:39 -07:00
Johannes Schindelin 16da134b1f read-trees: refactor the unpack_trees() part
Basically, the options are passed by a struct unpack_trees_options now.
That's all.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-30 23:31:31 -07:00