Commit graph

355 commits

Author SHA1 Message Date
Linus Torvalds e8bd78c3fc close_sha1_file(): make it easier to diagnose errors
A bug report with "unable to write sha1 file" made us realize that we do
not have enough information to guess why close() is failing.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-24 14:39:20 -07:00
Shawn O. Pearce fa3a0c94dc Clear the delta base cache if a pack is rebuilt
There is some risk that re-opening a regenerated pack file with
different offsets could leave stale entries within the delta base
cache that could be matched up against other objects using the same
"struct packed_git*" and pack offset.

Throwing away the entire delta base cache in this case is safer,
as we don't have to worry about a recycled "struct packed_git*"
matching to the wrong base object, resulting in delta apply
errors while unpacking an object.

Suggested-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-11 10:25:24 -08:00
Shawn O. Pearce 3d20c636af Clear the delta base cache during fast-import checkpoint
Otherwise we may reuse the same memory address for a totally
different "struct packed_git", and a previously cached object from
the prior occupant might be returned when trying to unpack an object
from the new pack.

Found-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-10 15:30:59 -08:00
Jeff King 915308b187 avoid 31-bit truncation in write_loose_object
The size of the content we are adding may be larger than
2.1G (i.e., "git add gigantic-file"). Most of the code-path
to do so uses size_t or unsigned long to record the size,
but write_loose_object uses a signed int.

On platforms where "int" is 32-bits (which includes x86_64
Linux platforms), we end up passing malloc a negative size.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-28 23:40:53 -08:00
Nicolas Pitre c74faea19e make sure packs to be replaced are closed beforehand
Especially on Windows where an opened file cannot be replaced, make
sure pack-objects always close packs it is about to replace. Even on
non Windows systems, this could save potential bad results if ever
objects were to be read from the new pack file using offset from the old
index.

This should fix t5303 on Windows.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Tested-by: Johannes Sixt <j6t@kdbg.org> (MinGW)
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-12-10 17:56:05 -08:00
Junio C Hamano 0fd9d7e66d Merge branch 'bc/maint-keep-pack' into maint
* bc/maint-keep-pack:
  repack: only unpack-unreachable if we are deleting redundant packs
  t7700: test that 'repack -a' packs alternate packed objects
  pack-objects: extend --local to mean ignore non-local loose objects too
  sha1_file.c: split has_loose_object() into local and non-local counterparts
  t7700: demonstrate mishandling of loose objects in an alternate ODB
  builtin-gc.c: use new pack_keep bitfield to detect .keep file existence
  repack: do not fall back to incremental repacking with [-a|-A]
  repack: don't repack local objects in packs with .keep file
  pack-objects: new option --honor-pack-keep
  packed_git: convert pack_local flag into a bitfield and add pack_keep
  t7700: demonstrate mishandling of objects in packs with a .keep file
2008-12-02 23:00:04 -08:00
Sam Vilain 35243577ab sha1_file.c: resolve confusion EACCES vs EPERM
An earlier commit 916d081 (Nicer error messages in case saving an object
to db goes wrong, 2006-11-09) confused EACCES with EPERM, the latter of
which is an unlikely error from mkstemp().

Signed-off-by: Sam Vilain <sam@vilain.net>
2008-11-27 19:11:21 -08:00
Joey Hess 65117abc04 sha1_file: avoid bogus "file exists" error message
This avoids the following misleading error message:

error: unable to create temporary sha1 filename ./objects/15: File exists

mkstemp can fail for many reasons, one of which, ENOENT, can occur if
the directory for the temp file doesn't exist. create_tmpfile tried to
handle this case by always trying to mkdir the directory, even if it
already existed. This caused errno to be clobbered, so one cannot tell
why mkstemp really failed, and it truncated the buffer to just the
directory name, resulting in the strange error message shown above.

Note that in both occasions that I've seen this failure, it has not been
due to a missing directory, or bad permissions, but some other, unknown
mkstemp failure mode that did not occur when I ran git again. This code
could perhaps be made more robust by retrying mkstemp, in case it was a
transient failure.

Signed-off-by: Joey Hess <joey@kitenet.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-27 18:48:53 -08:00
Brandon Casey 0f4dc14ac4 sha1_file.c: split has_loose_object() into local and non-local counterparts
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-12 10:29:22 -08:00
Brandon Casey 8d25931d6f packed_git: convert pack_local flag into a bitfield and add pack_keep
pack_keep will be set when a pack file has an associated .keep file.

Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-12 10:28:08 -08:00
Junio C Hamano 581000a419 Merge branch 'jc/maint-co-track' into maint
* jc/maint-co-track:
  Enhance hold_lock_file_for_{update,append}() API
  demonstrate breakage of detached checkout with symbolic link HEAD
  Fix "checkout --track -b newbranch" on detached HEAD
2008-11-02 13:36:14 -08:00
Junio C Hamano acd3b9eca8 Enhance hold_lock_file_for_{update,append}() API
This changes the "die_on_error" boolean parameter to a mere "flags", and
changes the existing callers of hold_lock_file_for_update/append()
functions to pass LOCK_DIE_ON_ERROR.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-10-19 12:35:37 -07:00
Björn Steinbrink 1fb23e6550 force_object_loose: Fix memory leak
read_packed_sha1 expectes its caller to free the buffer it returns, which
force_object_loose didn't do.

This leak is eventually triggered by "git gc", when it is manually invoked
or there are too many packs around, making gc totally unusable when there
are lots of unreachable objects.

Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-10-18 06:19:06 -07:00
Thomas Rast e32c0a9c38 sha1_file: link() returns -1 on failure, not errno
5723fe7 (Avoid cross-directory renames and linking on object creation,
2008-06-14) changed the call to use link() directly instead of through a
custom wrapper, but forgot that it returns 0 or -1, not 0 or errno.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-09-18 19:51:13 -07:00
Nicolas Pitre 4b480c6716 discard revindex data when pack list changes
This is needed to fix verify-pack -v with multiple pack arguments.

Also, in theory, revindex data (if any) must be discarded whenever
reprepare_packed_git() is called. In practice this is hard to trigger
though.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-22 22:00:22 -07:00
Steven Grimm ddd63e64e4 Optimize sha1_object_info for loose objects, not concurrent repacks
When dealing with a repository with lots of loose objects, sha1_object_info
would rescan the packs directory every time an unpacked object was referenced
before finally giving up and looking for the loose object. This caused a lot
of extra unnecessary system calls during git pack-objects; the code was
rereading the entire pack directory once for each loose object file.

This patch looks for a loose object before falling back to rescanning the
pack directory, rather than the other way around.

Signed-off-by: Steven Grimm <koreth@midwinter.com>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-05 21:21:20 -07:00
Nicolas Pitre ac9391093f restore legacy behavior for read_sha1_file()
Since commit 8eca0b47ff, it is possible
for read_sha1_file() to return NULL even with existing objects when they
are corrupted.  Previously a corrupted object would have terminated the
program immediately, effectively making read_sha1_file() return NULL
only when specified object is not found.

Let's restore this behavior for all users of read_sha1_file() and
provide a separate function with the ability to not terminate when
bad objects are encountered.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-14 23:35:32 -07:00
Junio C Hamano 948e7471e0 Merge branch 'sp/maint-pack-memuse'
* sp/maint-pack-memuse:
  Correct pack memory leak causing git gc to try to exceed ulimit

Conflicts:

	sha1_file.c
2008-07-09 14:46:46 -07:00
Shawn O. Pearce eac12e2d4d Correct pack memory leak causing git gc to try to exceed ulimit
When recursing to unpack a delta base we must unuse_pack() so that
the pack window for the current object does not remain pinned in
memory while the delta base is itself being unpacked and materialized
for our use.

On a long delta chain of 50 objects we may need to access 6 different
windows from a very large (>3G) pack file in order to obtain all
of the delta base content.  If the process ulimit permits us to
map/allocate only 1.5G we must release windows during this recursion
to ensure we stay within the ulimit and transition memory from pack
cache to standard malloc, or other mmap needs.

Inserting an unuse_pack() call prior to the recursion allows us to
avoid pinning the current window, making it available for garbage
collection if memory runs low.

This has been broken since at least before 1.5.1-rc1, and very
likely earlier than that.  Its fixed now.  :)

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-09 14:45:42 -07:00
Ramsay Jones 6e1c23442a Fix some warnings (on cygwin) to allow -Werror
When printing valuds of type uint32_t, we should use PRIu32, and should
not assume that it is unsigned int.  On 32-bit platforms, it could be
defined as unsigned long. The same caution applies to ntohl().

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-05 17:26:29 -07:00
Junio C Hamano bb1ab2db08 Merge branch 'j6t/mingw'
* j6t/mingw: (38 commits)
  compat/pread.c: Add a forward declaration to fix a warning
  Windows: Fix ntohl() related warnings about printf formatting
  Windows: TMP and TEMP environment variables specify a temporary directory.
  Windows: Make 'git help -a' work.
  Windows: Work around an oddity when a pipe with no reader is written to.
  Windows: Make the pager work.
  When installing, be prepared that template_dir may be relative.
  Windows: Use a relative default template_dir and ETC_GITCONFIG
  Windows: Compute the fallback for exec_path from the program invocation.
  Turn builtin_exec_path into a function.
  Windows: Use a customized struct stat that also has the st_blocks member.
  Windows: Add a custom implementation for utime().
  Windows: Add a new lstat and fstat implementation based on Win32 API.
  Windows: Implement a custom spawnve().
  Windows: Implement wrappers for gethostbyname(), socket(), and connect().
  Windows: Work around incompatible sort and find.
  Windows: Implement asynchronous functions as threads.
  Windows: Disambiguate DOS style paths from SSH URLs.
  Windows: A rudimentary poll() emulation.
  Windows: Implement start_command().
  ...
2008-07-02 21:57:52 -07:00
Junio C Hamano abf7e0df17 Merge branch 'lt/config-fsync'
* lt/config-fsync:
  Add config option to enable 'fsync()' of object files
  Split up default "i18n" and "branch" config parsing into helper routines
  Split up default "user" config parsing into helper routine
  Split up default "core" config parsing into helper routine
2008-06-25 13:19:49 -07:00
Jeff King 2beebd22f4 clone: create intermediate directories of destination repo
The shell version used to use "mkdir -p" to create the repo
path, but the C version just calls "mkdir". Let's replicate
the old behavior. We have to create the git and worktree
leading dirs separately; while most of the time, the
worktree dir contains the git dir (as .git), the user can
override this using GIT_WORK_TREE.

We can reuse safe_create_leading_directories, but we need to
make a copy of our const buffer to do so. Since
merge-recursive uses the same pattern, we can factor this
out into a global function. This has two other cleanup
advantages for merge-recursive:

  1. mkdir_p wasn't a very good name. "mkdir -p foo/bar" actually
     creates bar, but this function just creates the leading
     directories.

  2. mkdir_p took a mode argument, but it was completely
     ignored.

Acked-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-25 11:44:15 -07:00
Nicolas Pitre 99093238bb optimize verify-pack a bit
Using find_pack_entry_one() to get object offsets is rather suboptimal
when nth_packed_object_offset() can be used directly.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-24 23:58:57 -07:00
Jeff King 8e21d63b02 clone: create intermediate directories of destination repo
The shell version used to use "mkdir -p" to create the repo
path, but the C version just calls "mkdir". Let's replicate
the old behavior. We have to create the git and worktree
leading dirs separately; while most of the time, the
worktree dir contains the git dir (as .git), the user can
override this using GIT_WORK_TREE.

We can reuse safe_create_leading_directories, but we need to
make a copy of our const buffer to do so. Since
merge-recursive uses the same pattern, we can factor this
out into a global function. This has two other cleanup
advantages for merge-recursive:

  1. mkdir_p wasn't a very good name. "mkdir -p foo/bar" actually
     creates bar, but this function just creates the leading
     directories.

  2. mkdir_p took a mode argument, but it was completely
     ignored.

Acked-by: Daniel Barkalow <barkalow@iabervon.org>

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-24 23:23:21 -07:00
Nicolas Pitre 27d69a465d refactor pack structure allocation
New pack structures are currently allocated in 2 different places
and all members have to be initialized explicitly.  This is prone
to errors leading to segmentation faults as found by Teemu Likonen.

Let's have a common place where this structure is allocated, and have
all members explicitly initialized to zero.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-24 17:03:44 -07:00
Nicolas Pitre 8eca0b47ff implement some resilience against pack corruptions
We should be able to fall back to loose objects or alternative packs when
a pack becomes corrupted.  This is especially true when an object exists
in one pack only as a delta but its base object is corrupted.  Currently
there is no way to retrieve the former object even if the later is
available in another pack or loose.

This patch allows for a delta to be resolved (with a performance cost)
using a base object from a source other than the pack where that delta
is located.  Same thing for non-delta objects: rather than failing
outright, a search is made in other packs or used loose when the
currently active pack has it but corrupted.

Of course git will become extremely noisy with error messages when that
happens.  However, if the operation succeeds nevertheless, a simple
'git repack -a -f -d' will "fix" the corrupted repository given that all
corrupted objects have a good duplicate somewhere in the object store,
possibly manually copied from another source.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-23 21:29:33 -07:00
Patrick Higgins 6ff6af62ec Workaround for AIX mkstemp()
The AIX mkstemp will modify it's template parameter to an empty string if
the call fails. This caused a subsequent mkdir to fail.

Signed-off-by: Patrick Higgins <patrick.higgins@cexp.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-23 16:13:38 -07:00
Johannes Sixt 8385abfda5 Windows: Handle absolute paths in safe_create_leading_directories().
In this function we must be careful to handle drive-local paths else there
is a danger that it runs into an infinite loop.

Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
2008-06-23 13:30:27 +02:00
Johannes Sixt 80ba074f41 Windows: Use the Windows style PATH separator ';'.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
2008-06-22 11:32:45 +02:00
Linus Torvalds aafe9fbaf4 Add config option to enable 'fsync()' of object files
As explained in the documentation[*] this is totally useless on
filesystems that do ordered/journalled data writes, but it can be a
useful safety feature on filesystems like HFS+ that only journal the
metadata, not the actual file contents.

It defaults to off, although we could presumably in theory some day
auto-enable it on a per-filesystem basis.

[*] Yes, I updated the docs for the thing.  Hell really _has_ frozen
    over, and the four horsemen are probably just beyond the horizon.
    EVERYBODY PANIC!

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-18 16:50:35 -07:00
Junio C Hamano 79c6dca413 sha1_file.c: simplify parse_pack_index()
It was implemented as a thin wrapper around an otherwise unused
helper function parse_pack_index_file().  The code becomes simpler
and easier to read by consolidating the two.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-16 22:19:00 -07:00
Junio C Hamano 3bfaf01857 create_tempfile: make sure that leading directories can be accessible by peers
In a shared repository, we should make sure adjust_shared_perm() is called
after creating the initial fan-out directories under objects/ directory.

Earlier an logico called the function only when mkdir() failed; we should
do so when mkdir() succeeded.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-16 22:02:12 -07:00
Linus Torvalds 1421c5f274 write_loose_object: don't bother trying to read an old object
Before even calling this, all callers have done a "has_sha1_file(sha1)"
or "has_loose_object(sha1)" check, so there is no point in doing a
second check.

If something races with us on object creation, we handle that in the
final link() that moves it to the right place.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-16 21:46:47 -07:00
Linus Torvalds c529d75a75 Simplify and rename find_sha1_file()
Now that we've made the loose SHA1 file reading more careful and
streamlined, we only use the old find_sha1_file() function for checking
whether a loose object file exists at all.

As such, the whole 'return stat information' part of it was just
pointless (nobody cares any more), and the naming of the function is not
really all that relevant either.

So simplify it to not do a 'stat()', but just an existence check (which
is what the callers want), and rename it to 'has_loose_object()' which
matches the use.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-14 14:39:22 -07:00
Linus Torvalds 44d1c19ee8 Make loose object file reading more careful
We used to do 'stat()+open()+mmap()+close()' to read the loose object
file data, which does work fine, but has a couple of problems:

 - it unnecessarily walks the filename twice (at 'stat()' time and then
   again to open it)

 - NFS generally has open-close consistency guarantees, which means that
   the initial 'stat()' was technically done outside of the normal
   consistency rules.

So change it to do 'open()+fstat()+mmap()+close()' instead, which avoids
both these issues.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-14 14:39:22 -07:00
Linus Torvalds 5723fe7e3c Avoid cross-directory renames and linking on object creation
Instead of creating new temporary objects in the top-level git object
directory, create them in the same directory they will finally end up in
anyway.  This avoids making the final atomic "rename to stable name"
operation be a cross-directory event, which makes it a lot easier for
various filesystems.

Several filesystems do things like change the inode number when moving
files across directories (or refuse to do it entirely).

In particular, it can also cause problems for NFS implementations that
change the filehandle of a file when it moves to a different directory,
like the old user-space NFS server did, and like the Linux knfsd still
does if you don't export your filesystems with 'no_subtree_check' or if
you export a filesystem that doesn't have stable inode numbers across
renames).

This change also obviously implies creating the object fan-out
subdirectory at tempfile creation time, rather than at the final
move_temp_to_file() time.  Which actually accounts for most of the size
of the patch.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-14 14:39:22 -07:00
Junio C Hamano 6483925999 sha1_file.c: dead code removal
write_sha1_from_fd() and write_sha1_to_fd() were dead code nobody called,
neither the latter's helper repack_object() was.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-13 23:00:51 -07:00
Linus Torvalds e9039dd351 Consolidate SHA1 object file close
This consolidates the common operations for closing the new temporary file
that we have written, before we move it into place with the final name.

There's some common code there (make it read-only and check for errors on
close), but more importantly, this also gives a single place to add an
fsync_or_die() call if we want to add a safe mode.

This was triggered due to Denis Bueno apparently twice being able to
corrupt his git repository on OS X due to an unlucky combination of kernel
crashes and a not-very-robust filesystem.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-10 22:23:18 -07:00
Junio C Hamano 6eec46bdda fix sha1_pack_index_name()
An earlier commit 633f43e (Remove redundant code, eliminate one static
variable, 2008-05-24) had a thinko (perhaps an eyeno) that broke
sha1_pack_index_name() function.  One symptom of this was that the http
walker is now completely broken.

This should fix it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-28 10:24:32 -07:00
Junio C Hamano b84c343c88 Merge branch 'db/clone-in-c'
* db/clone-in-c:
  Add test for cloning with "--reference" repo being a subset of source repo
  Add a test for another combination of --reference
  Test that --reference actually suppresses fetching referenced objects
  clone: fall back to copying if hardlinking fails
  builtin-clone.c: Need to closedir() in copy_or_link_directory()
  builtin-clone: fix initial checkout
  Build in clone
  Provide API access to init_db()
  Add a function to set a non-default work tree
  Allow for having for_each_ref() list extra refs
  Have a constant extern refspec for "--tags"
  Add a library function to add an alternate to the alternates file
  Add a lockfile function to append to a file
  Mark the list of refs to fetch as const

Conflicts:

	cache.h
	t/t5700-clone-reference.sh
2008-05-25 13:41:37 -07:00
Heikki Orsila 633f43e1f7 Remove redundant code, eliminate one static variable
Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-24 22:05:06 -07:00
Nicolas Pitre bbac73117e add a force_object_loose() function
This is meant to force the creation of a loose object even if it
already exists packed.  Needed for the next commit.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-13 22:42:33 -07:00
Daniel Barkalow bef70b22ba Add a library function to add an alternate to the alternates file
This is in the core so that, if the alternates file has already been
read, the addition can be parsed and put into effect for the current
process.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-04 17:41:44 -07:00
Heikki Orsila c697ad143b Cleanup xread() loops to use read_in_full()
Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-03 22:15:25 -07:00
Junio C Hamano 628522ec14 sha1-lookup: more memory efficient search in sorted list of SHA-1
Currently, when looking for a packed object from the pack idx, a
simple binary search is used.

A conventional binary search loop looks like this:

        unsigned lo, hi;
        do {
                unsigned mi = (lo + hi) / 2;
                int cmp = "entry pointed at by mi" minus "target";
                if (!cmp)
                        return mi; "mi is the wanted one"
                if (cmp > 0)
                        hi = mi; "mi is larger than target"
                else
                        lo = mi+1; "mi is smaller than target"
        } while (lo < hi);
	"did not find what we wanted"

The invariants are:

  - When entering the loop, 'lo' points at a slot that is never
    above the target (it could be at the target), 'hi' points at
    a slot that is guaranteed to be above the target (it can
    never be at the target).

  - We find a point 'mi' between 'lo' and 'hi' ('mi' could be
    the same as 'lo', but never can be as high as 'hi'), and
    check if 'mi' hits the target.  There are three cases:

     - if it is a hit, we have found what we are looking for;

     - if it is strictly higher than the target, we set it to
       'hi', and repeat the search.

     - if it is strictly lower than the target, we update 'lo'
       to one slot after it, because we allow 'lo' to be at the
       target and 'mi' is known to be below the target.

    If the loop exits, there is no matching entry.

When choosing 'mi', we do not have to take the "middle" but
anywhere in between 'lo' and 'hi', as long as lo <= mi < hi is
satisfied.  When we somehow know that the distance between the
target and 'lo' is much shorter than the target and 'hi', we
could pick 'mi' that is much closer to 'lo' than (hi+lo)/2,
which a conventional binary search would pick.

This patch takes advantage of the fact that the SHA-1 is a good
hash function, and as long as there are enough entries in the
table, we can expect uniform distribution.  An entry that begins
with for example "deadbeef..." is much likely to appear much
later than in the midway of a reasonably populated table.  In
fact, it can be expected to be near 87% (222/256) from the top
of the table.

This is a work-in-progress and has switches to allow easier
experiments and debugging.  Exporting GIT_USE_LOOKUP environment
variable enables this code.

On my admittedly memory starved machine, with a partial KDE
repository (3.0G pack with 95M idx):

    $ GIT_USE_LOOKUP=t git log -800 --stat HEAD >/dev/null
    3.93user 0.16system 0:04.09elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+55588minor)pagefaults 0swaps

Without the patch, the numbers are:

    $ git log -800 --stat HEAD >/dev/null
    4.00user 0.15system 0:04.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+60258minor)pagefaults 0swaps

In the same repository:

    $ GIT_USE_LOOKUP=t git log -2000 HEAD >/dev/null
    0.12user 0.00system 0:00.12elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+4241minor)pagefaults 0swaps

Without the patch, the numbers are:

    $ git log -2000 HEAD >/dev/null
    0.05user 0.01system 0:00.07elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+8506minor)pagefaults 0swaps

There isn't much time difference, but the number of minor faults
seems to show that we are touching much smaller number of pages,
which is expected.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-04-09 01:23:52 -07:00
Nicolas Pitre 70f5d5d31c fix unimplemented packed_object_info_detail() features
Since commit eb32d236df, there was a TODO
comment in packed_object_info_detail() about the SHA1 of base object to
OBJ_OFS_DELTA objects.  So here it is at last.

While at it, providing the actual storage size information as well is now
trivial.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-01 01:44:46 -08:00
Junio C Hamano c484166374 Merge branch 'jk/empty-tree'
* jk/empty-tree:
  add--interactive: handle initial commit better
  hard-code the empty tree object
2008-02-20 16:13:28 -08:00
Junio C Hamano ee4f06c0a6 Merge branch 'mk/maint-parse-careful'
* mk/maint-parse-careful:
  peel_onion: handle NULL
  check return value from parse_commit() in various functions
  parse_commit: don't fail, if object is NULL
  revision.c: handle tag->tagged == NULL
  reachable.c::process_tree/blob: check for NULL
  process_tag: handle tag->tagged == NULL
  check results of parse_commit in merge_bases
  list-objects.c::process_tree/blob: check for NULL
  reachable.c::add_one_tree: handle NULL from lookup_tree
  mark_blob/tree_uninteresting: check for NULL
  get_sha1_oneline: check return value of parse_object
  read_object_with_reference: don't read beyond the buffer
2008-02-18 20:56:01 -08:00
Martin Koegler 50974ec994 read_object_with_reference: don't read beyond the buffer
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-18 19:20:17 -08:00