Commit graph

3325 commits

Author SHA1 Message Date
Linus Torvalds 01df529722 Handling large files with GIT
On Tue, 14 Feb 2006, Linus Torvalds wrote:
>
> Here, btw, is the trivial diff to turn my previous "tree-resolve" into a
> "resolve tree relative to the current branch".

Gaah. It was trivial, and it happened to work fine for my test-case, but
when I started looking at not doing that extremely aggressive subdirectory
merging, that showed a few other issues...

So in case people want to try, here's a third patch. Oh, and it's against
my _original_ path, not incremental to the middle one (ie both patches two
and three are against patch #1, it's not a nice series).

Now I'm really done, and won't be sending out any more patches today.
Sorry for the noise.

		Linus

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-15 23:35:40 -08:00
Linus Torvalds 492e0759bf Handling large files with GIT
On Tue, 14 Feb 2006, Junio C Hamano wrote:

> Linus Torvalds <torvalds@osdl.org> writes:
>
> > If somebody is interested in making the "lots of filename changes" case go
> > fast, I'd be more than happy to walk them through what they'd need to
> > change. I'm just not horribly motivated to do it myself. Hint, hint.
>
> In case anybody is wondering, I share the same feeling.  I
> cannot say I'd be "more than happy to" clean up potential
> breakages during the development of such changes, but if the
> change eventually would help certain use cases, I can be
> persuaded to help debugging such a mess ;-).

Actually, I got interested in seeing how hard this is, and wrote a simple
first cut at doing a tree-optimized merger.

Let me shout a bit first:

  THIS IS WORKING CODE, BUT BE CAREFUL: IT'S A TECHNOLOGY DEMONSTRATION
  RATHER THAN THE FINAL PRODUCT!

With that out of the way, let me descibe what this does (and then describe
the missing parts).

This is basically a three-way merge that works entirely on the "tree"
level, rather than on the index. A lot of the _concepts_ are the same,
though, and if you're familiar with the results of an index merge, some of
the output will make more sense.

You give it three trees: the base tree (tree 0), and the two branches to
be merged (tree 1 and tree 2 respectively). It will then walk these three
trees, and resolve them as it goes along.

The interesting part is:
 - it can resolve whole sub-directories in one go, without actually even
   looking recursively at them. A whole subdirectory will resolve the same
   way as any individual files will (although that may need some
   modification, see later).
 - if it has a "content conflict", for subdirectories that means "try to
   do a recursive tree merge", while for non-subdirectories it's just a
   content conflict and we'll output the stage 1/2/3 information.
 - a successful merge will output a single stage 0 ("merged") entry,
   potentially for a whole subdirectory.
 - it outputs all the resolve information on stdout, so something like the
   recursive resolver can pretty easily parse it all.

Now, the caveats:
 - we probably need to be more careful about subdirectory resolves. The
   trivial case (both branches have the exact same subdirectory) is a
   trivial resolve, but the other cases ("branch1 matches base, branch2 is
   different" probably can't be silently just resolved to the "branch2"
   subdirectory state, since it might involve renames into - or out of -
   that subdirectory)
 - we do not track the current index file at all, so this does not do the
   "check that index matches branch1" logic that the three-way merge in
   git-read-tree does. The theory is that we'd do a full three-way merge
   (ignoring the index and working directory), and then to update the
   working tree, we'd do a two-way "git-read-tree branch1->result"
 - I didn't actually make it do all the trivial resolve cases that
   git-read-tree does. It's a technology demonstration.

Finally (a more serious caveat):
 - doing things through stdout may end up being so expensive that we'd
   need to do something else. In particular, it's likely that I should
   not actually output the "merge results", but instead output a "merge
   results as they _differ_ from branch1"

However, I think this patch is already interesting enough that people who
are interested in merging trees might want to look at it. Please keep in
mind that tech _demo_ part, and in particular, keep in mind the final
"serious caveat" part.

In many ways, the really _interesting_ part of a merge is not the result,
but how it _changes_ the branch we're merging into. That's particularly
important as it should hopefully also mean that the output size for any
reasonable case is minimal (and tracks what we actually need to do to the
current state to create the final result).

The code very much is organized so that doing the result as a "diff
against branch1" should be quite easy/possible. I was actually going to do
it, but I decided that it probably makes the output harder to read. I
dunno.

Anyway, let's think about this kind of approach.. Note how the code itself
is actually quite small and short, although it's prbably pretty "dense".

As an interesting test-case, I'd suggest this merge in the kernel:

	git-merge-tree $(git-merge-base 4cbf876 7d2babc) 4cbf876 7d2babc

which resolves beautifully (there are no actual file-level conflicts), and
you can look at the output of that command to start thinking about what
it does.

The interesting part (perhaps) is that timing that command for me shows
that it takes all of 0.004 seconds.. (the git-merge-base thing takes
considerably more ;)

The point is, we _can_ do the actual merge part really really quickly.

		Linus

PS. Final note: when I say that it is "WORKING CODE", that is obviously by
my standards. IOW, I tested it once and it gave reasonable results - so it
must be perfect.

Whether it works for anybody else, or indeed for any other test-case, is
not my problem ;)

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-15 23:35:40 -08:00
Junio C Hamano be97bd1b88 Merge branch 'jc/add'
* jc/add:
  Detect misspelled pathspec to git-add
2006-02-15 19:42:15 -08:00
Junio C Hamano 5f906b1c34 Merge fixes up to 1.2.1 2006-02-15 19:39:21 -08:00
Josef Weidendorfer babfaf8dee More useful/hinting error messages in git-checkout
Signed-off-by: Josef Weidendorfer <Josef.Weidendorfer@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-15 19:14:04 -08:00
Fernando J. Pereda 6c5c62f340 Print an error if cloning a http repo and NO_CURL is set
If Git is compiled with NO_CURL=YesPlease and one tries to
clone a http repository, git-clone tries to call the curl
binary. This trivial patch prints an error instead in such
situation.

Signed-off-by: Fernando J. Pereda <ferdy@gentoo.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-15 19:14:01 -08:00
Junio C Hamano f8f135c9ba packed objects: minor cleanup
The delta depth is unsigned.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-15 13:03:27 -08:00
Junio C Hamano 45e48120bb Detect misspelled pathspec to git-add
This is in the same spirit as an earlier patch for git-commit.
It does an extra ls-files to avoid complaining when a fully
tracked directory name is given on the command line (otherwise
--others restriction would say the pathspec does not match).

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-15 01:56:55 -08:00
Junio C Hamano 6becd7da87 ls-files --error-unmatch pathspec error reporting fix.
Earlier patch mistakenly used prefix_len when it meant
prefix_offset.  The latter is to strip the leading directories
when run from a subdirectory.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-15 01:10:13 -08:00
Junio C Hamano e8a1a11d4e Merge branch 'kh/svn'
* kh/svn:
  git-svnimport: -r adds svn revision number to commit messages
2006-02-14 17:51:50 -08:00
Junio C Hamano 756e3ee0c6 Merge branch 'jc/commit'
* jc/commit:
  commit: detect misspelled pathspec while making a partial commit.
  combine-diff: diff-files fix (#2)
  combine-diff: diff-files fix.
2006-02-14 17:51:02 -08:00
Junio C Hamano 9b6c66e05c Merge branch 'jc/rebase'
* jc/rebase:
  rebase: allow a hook to refuse rebasing.
2006-02-14 17:49:00 -08:00
Junio C Hamano 709fb393ca Merge branch 'ra/email'
* ra/email:
  send-email: Add --cc
  send-email: Add some options for controlling how addresses are automatically added to the cc: list.
2006-02-14 17:46:41 -08:00
Junio C Hamano 504fe714fe checkout: fix dirty-file display.
When we refused to switch branches, we incorrectly showed
differences from the branch we would have switched to.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-14 16:05:57 -08:00
Junio C Hamano bba319b5ce commit: detect misspelled pathspec while making a partial commit.
When you say "git commit Documentaiton" to make partial commit
for the files only in that directory, we did not detect that as
a misspelled pathname and attempted to commit index without
change.  If nothing matched, there is no harm done, but if the
index gets modified otherwise by having another valid pathspec
or after an explicit update-index, a user will not notice
without paying attention to the "git status" preview.

This introduces --error-unmatch option to ls-files, and uses it
to detect this common user error.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-14 14:48:22 -08:00
Karl Hasselström 0a48a344c6 git-svnimport: -r adds svn revision number to commit messages
New -r flag for prepending the corresponding Subversion revision
number to each commit message.

Signed-off-by: Karl Hasselström <kha@treskal.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-14 01:30:43 -08:00
Junio C Hamano 9ece7169a4 combine-diff: diff-files fix (#2)
The raw format "git-diff-files -c" to show unmerged state forgot
to initialize the status fields from parents, causing NUL
characters to be emitted.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-14 01:11:42 -08:00
Junio C Hamano 6a9b87972f Merge some proposed fixes
Conflicts:

	Documentation/git-commit.txt - taking the post 1.2.0 semantics.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-13 23:34:58 -08:00
Junio C Hamano 057f98eda1 Merge branch 'pb/bisect'
* pb/bisect:
  Properly git-bisect reset after bisecting from non-master head
2006-02-13 23:26:53 -08:00
Junio C Hamano 713a11fceb combine-diff: diff-files fix.
When showing a conflicted merge from index stages and working
tree file, we did not fetch the mode from the working tree,
and mistook that as a deleted file.  Also if the manual
resolution (or automated resolution by git rerere) ended up
taking either parent's version, we did not show _anything_ for
that path.  Either was quite bad and confusing.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-13 23:07:04 -08:00
Fredrik Kuivinen 3654638513 s/SHELL/SHELL_PATH/ in Makefile
With the current Makefile we don't use the shell chosen by the
platform specific defines when we invoke GIT-VERSION-GEN.

Signed-off-by: Fredrik Kuivinen <freku045@student.liu.se>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-13 22:13:22 -08:00
Junio C Hamano 4631c0035d bisect: remove BISECT_NAMES after done.
I noticed that we forgot to clean this file and kept it that
way, while trying to help with Andrew's bisect problem.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-13 21:55:27 -08:00
Junio C Hamano 41ac06c7a3 Documentation: git-ls-files asciidocco.
Noticed by Jon Nelson.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-13 21:52:10 -08:00
Ryan Anderson da140f8bbf send-email: Add --cc
Since Junio used this in an example, and I've personally tried to use it, I
suppose the option should actually exist.

Signed-off-by: Ryan Anderson <ryan@michonline.com>
2006-02-13 03:32:10 -05:00
Junio C Hamano 64491e1ea9 Documentation: git-commit in 1.2.X series defaults to --include.
The documentation was mistakenly describing the --only semantics to
be default.  The 1.2.0 release and its maintenance series 1.2.X will
keep the traditional --include semantics as the default.  Clarify the
situation.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-13 00:32:10 -08:00
Ryan Anderson a985d595ad send-email: Add some options for controlling how addresses are automatically added to the cc: list.
Signed-off-by: Ryan Anderson <ryan@michonline.com>
2006-02-13 03:32:01 -05:00
Junio C Hamano 9a111c91b0 rebase: allow a hook to refuse rebasing.
This lets a hook to interfere a rebase and help prevent certain
branches from being rebased by mistake.  A sample hook to show
how to prevent a topic branch that has already been merged into
publish branch.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-13 00:17:33 -08:00
Junio C Hamano 4170a19587 git-commit: Now --only semantics is the default.
This changes the "git commit paths..." to default to --only
semantics from traditional --include semantics, as agreed on the
list.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-12 23:55:07 -08:00
Junio C Hamano bd9ca0baff GIT 1.2.0
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-12 13:14:53 -08:00
Junio C Hamano 4bbdfab766 Fix "test: unexpected operator" on bsd
This fixes the same issue as a previous fix by Alex Riesen does.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-12 13:13:33 -08:00
Petr Baudis 810255fd12 Properly git-bisect reset after bisecting from non-master head
git-bisect reset without an argument would return to master even
if the bisecting started at a non-master branch. This patch makes
it save the original branch name to .git/head-name and restore it
afterwards.

This is also compatible with Cogito and cg-seek, so cg-status will
show that we are seeked on the bisect branch and cg-reset will
properly restore the original branch.

git-bisect start will refuse to work if it is not on a bisect but
.git/head-name exists; this is to protect against conflicts with
other seeking tools.

Signed-off-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-12 13:07:02 -08:00
Junio C Hamano c5e09c1fbe git-commit: show dirtiness including index.
Earlier, when we switched a branch we used diff-files to show
paths that are dirty in the working tree.  But we allow switching
branches with updated index ("read-tree -m -u $old $new" works that
way), and only showing paths that have differences in the working
tree but not paths that are different in index was confusing.

This shows both as modified from the top commit of the branch we
just have switched to.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-12 13:05:53 -08:00
Junio C Hamano 024701f1d8 Make pack-objects chattier.
You could give -q to squelch it, but currently no tool does it.
This would make 'git clone host:repo here' over ssh not silent
again.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-12 13:01:54 -08:00
Alex Riesen 0dbc4e89bb avoid echo -e, there are systems where it does not work
FreeBSD 4.11 being one example: the built-in echo doesn't have -e,
and the installed /bin/echo does not do "-e" as well.
"printf" works, laking just "\e" and "\xAB'.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-12 11:36:19 -08:00
Alex Riesen ef1af9d9af fix "test: 2: unexpected operator" on bsd
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-12 11:36:17 -08:00
Linus Torvalds d7ee090d0d Fix object re-hashing
The hashed object lookup had a subtle bug in re-hashing: it did

	for (i = 0; i < count; i++)
		if (objs[i]) {
			.. rehash ..

where "count" was the old hash couny. Oon the face of it is obvious, since
it clearly re-hashes all the old objects.

However, it's wrong.

If the last old hash entry before re-hashing was in use (or became in use
by the re-hashing), then when re-hashing could have inserted an object
into the hash entries with idx >= count due to overflow. When we then
rehash the last old entry, that old entry might become empty, which means
that the overflow entries should be re-hashed again.

In other words, the loop has to be fixed to either traverse the whole
array, rather than just the old count.

(There's room for a slight optimization: instead of counting all the way
up, we can break when we see the first empty slot that is above the old
"count". At that point we know we don't have any collissions that we might
have to fix up any more. This patch only does the trivial fix)

[jc: with trivial fix on trivial fix]

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-12 11:24:50 -08:00
Junio C Hamano 2b796360ac hashtable-based objects: minimum fixups.
Calling hashtable_index from find_object before objs is created
would result in division by zero failure.  Avoid it.

Also the given object name may not be aligned suitably for
unsigned int; avoid dereferencing casted pointer.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-12 05:12:39 -08:00
Johannes Schindelin 070879ca93 Use a hashtable for objects instead of a sorted list
In a simple test, this brings down the CPU time from 47 sec to 22 sec.

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-12 05:12:39 -08:00
kent@lysator.liu.se 5b766ea901 Add howto about separating topics.
This howto consists of a footnote from an email by JC to the git
mailing list (<7vfyms0x4p.fsf@assigned-by-dhcp.cox.net>).

Signed-off-by: Kent Engstrom <kent@lysator.liu.se>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-12 05:02:42 -08:00
Junio C Hamano af8c28e145 Merge branch 'pb/repo'
* pb/repo:
  Add support for explicit type specifiers when calling git-repo-config
2006-02-12 05:02:30 -08:00
Junio C Hamano c611db196a Merge branch 'jc/fixdiff'
* jc/fixdiff:
  diff-tree: do not default to -c
2006-02-12 05:02:25 -08:00
Junio C Hamano 4890f62bc0 Avoid using "git-var -l" until it gets fixed.
This is to be nicer to people with unusable GECOS field.

"git-var -l" is currently broken in that when used by a user who
does not have a usable GECOS field and has not corrected it by
exporting GIT_COMMITTER_NAME environment variable it dies when
it tries to output GIT_COMMITTER_IDENT (same thing for AUTHOR).

"git-pull" used "git-var -l" only because it needed to get a
configuration variable before "git-repo-config --get" was
introduced.  Use the latter tool designed exactly for this
purpose.

"git-sh-setup" used "git-var GIT_AUTHOR_IDENT" without actually
wanting to use its value.  The only purpose was to cause the
command to check and barf if the repository format version
recorded in the $GIT_DIR/config file is too new for us to deal
with correctly.  Instead, use "repo-config --get" on a random
property and see if it die()s, and check if the exit status is
128 (comes from die -- missing variable is reported with exit
status 1, so we can tell that case apart).

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-12 04:59:25 -08:00
Petr Baudis 7162dff3dd Add support for explicit type specifiers when calling git-repo-config
Currently, git-repo-config will just return the raw value of option
as specified in the config file; this makes things difficult for scripts
calling it, especially if the value is supposed to be boolean.

This patch makes it possible to ask git-repo-config to check if the option
is of the given type (int or bool) and write out the value in its
canonical form. If you do not pass --int or --bool, the behaviour stays
unchanged and the raw value is emitted.

This also incidentally fixes the segfault when option with no value is
encountered.

[jc: tweaked the option parsing a bit to make it easier to see
 that the patch does not change anything but the type stuff in
 the diff output.  Also changed to avoid "foo ? : bar" construct. ]

Signed-off-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-12 00:26:54 -08:00
Junio C Hamano 6932c78cb4 diff-tree: do not default to -c
Marco says it breaks qgit.  This makes the flags a bit more
orthogonal.

  $ git-diff-tree -r --abbrev ca18

    No output from this command because you asked to skip merge by
    not having -m there.

  $ git-diff-tree -r -m --abbrev ca18
  ca182053c7
  :100644 100644 538d21d... 59042d1... M	Makefile
  :100644 100644 410b758... 6c47c3a... M	entry.c
  ca182053c7
  :100644 100644 30479b4... 59042d1... M	Makefile

    The same "independent sets of diff" as before without -c.

  $ git-diff-tree -r -m -c --abbrev ca18
  ca182053c7
  ::100644 100644 100644 538d21d... 30479b4... 59042d1... MM	Makefile

    Combined.

  $ git-diff-tree -r -c --abbrev ca18
  ca182053c7
  ::100644 100644 100644 538d21d... 30479b4... 59042d1... MM	Makefile

    Asking for combined without -m does not make sense, so -c
    implies -m.

We need to supply -c as default to whatchanged, which is a
one-liner.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-11 23:18:33 -08:00
Junio C Hamano 16139f9035 t5500: adjust to change in pack-object reporting behaviour.
Now pack-object is not as chatty when its stderr is not connected
to a terminal, so the test needs to be adjusted for that.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-11 23:08:23 -08:00
Junio C Hamano 1536dd9c61 Only call git-rerere if $GIT_DIR/rr-cache exists.
Johannes noticed that git-rerere depends on Digest.pm, and if
one does not use the command, one can live without it.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-11 18:55:43 -08:00
Christian Biesinger 7bbdeaa969 Use a relative path for SVN importing
The absolute path (with the leading slash) breaks SVN importing,
because it then looks for /trunk/... instead of /svn/trunk/...
(in my case, the repository URL was https://servername/svn/)

Signed-off-by: Christian Biesinger <cbiesinger@web.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-11 17:59:38 -08:00
Junio C Hamano 21fcd1bdea fetch-clone progress: finishing touches.
This makes fetch-pack also report the progress of packing part.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-11 17:54:18 -08:00
Linus Torvalds 98deeaa82f Fix fetch-clone in the presense of signals
We shouldn't fail a fetch just because a signal might have interrupted
the read.

Normally, we don't install any signal handlers, so EINTR really shouldn't
happen. That said, really old versions of Linux will interrupt an
interruptible system call even for signals that turn out to be ignored
(SIGWINCH is the classic example - resizing your xterm would cause it).
The same might well be true elsewhere too.

Also, since receive_keep_pack() doesn't control the caller, it can't know
that no signal handlers exist.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-11 16:50:03 -08:00
Linus Torvalds c548cf4ee0 Make "git clone" pack-fetching download statistics better
Average it out over a few events to make the numbers stable, and fix the
silly usec->binary-ms conversion.

Yeah, yeah, it's arguably eye-candy to keep the user calm, but let's do
that right.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-11 16:49:52 -08:00