Commit graph

647 commits

Author SHA1 Message Date
Junio C Hamano 7c2e8fc684 Merge branch 'tr/unpack-entry-use-after-free-fix'
* tr/unpack-entry-use-after-free-fix:
  unpack_entry: avoid freeing objects in base cache
2013-05-03 15:18:04 -07:00
Thomas Rast 756a042600 unpack_entry: avoid freeing objects in base cache
In the !delta_data error path of unpack_entry(), we run free(base).
This became a window for use-after-free() in abe601b (sha1_file:
remove recursion in unpack_entry, 2013-03-27), as follows:

Before abe601b, we got the 'base' from cache_or_unpack_entry(..., 0);
keep_cache=0 tells it to also remove that entry.  So the 'base' is at
this point not cached, and freeing it in the error path is the right
thing.

After abe601b, the structure changed: we use a three-phase approach
where phase 1 finds the innermost base or a base that is already in
the cache.  In phase 3 we therefore know that all bases we unpack are
not part of the delta cache yet.  (Observe that we pop from the cache
in phase 1, so this is also true for the very first base.)  So we make
no further attempts to look up the bases in the cache, and just call
add_delta_base_cache() on every base object we have assembled.

But the !delta_data error path remained unchanged, and now calls
free() on a base that has already been entered in the cache.  This
means that there is a use-after-free if we later use the same base
again.

So remove that free(); we are still going to use that data.

Reported-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-30 15:43:48 -07:00
Junio C Hamano 193e28f050 Merge branch 'tr/packed-object-info-wo-recursion'
Attempts to reduce the stack footprint of sha1_object_info()
and unpack_entry() codepaths.

* tr/packed-object-info-wo-recursion:
  sha1_file: remove recursion in unpack_entry
  Refactor parts of in_delta_base_cache/cache_or_unpack_entry
  sha1_file: remove recursion in packed_object_info
2013-04-18 11:46:23 -07:00
Junio C Hamano b9c78e9723 Merge branch 'jk/check-corrupt-objects-carefully'
Have the streaming interface and other codepaths more carefully
examine for corrupt objects.

* jk/check-corrupt-objects-carefully:
  clone: leave repo in place after checkout errors
  clone: run check_everything_connected
  clone: die on errors from unpack_trees
  add tests for cloning corrupted repositories
  streaming_write_entry: propagate streaming errors
  add test for streaming corrupt blobs
  avoid infinite loop in read_istream_loose
  read_istream_filtered: propagate read error from upstream
  check_sha1_signature: check return value from read_istream
  stream_blob_to_fd: detect errors reading from stream
2013-04-03 09:34:29 -07:00
Junio C Hamano 37ba4c61d0 Merge branch 'sw/safe-create-leading-dir-race'
* sw/safe-create-leading-dir-race:
  safe_create_leading_directories: fix race that could give a false negative
2013-04-02 15:09:48 -07:00
Jeff King f54fac5378 check_sha1_signature: check return value from read_istream
It's possible for read_istream to return an error, in which
case we just end up in an infinite loop (aside from EOF, we
do not even look at the result, but just feed it straight
into our running hash).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-27 13:46:55 -07:00
Thomas Rast abe601bba5 sha1_file: remove recursion in unpack_entry
Similar to the recursion in packed_object_info(), this leads to
problems on stack-space-constrained systems in the presence of long
delta chains.

We proceed in three phases:

1. Dig through the delta chain, saving each delta object's offsets and
   size on an ad-hoc stack.

2. Unpack the base object at the bottom.

3. Unpack and apply the deltas from the stack.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-27 13:25:16 -07:00
Thomas Rast 84dd81c126 Refactor parts of in_delta_base_cache/cache_or_unpack_entry
The delta base cache lookup and test were shared.  Refactor them;
we'll need both parts again.  Also, we'll use the clearing routine
later.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-27 13:24:43 -07:00
Steven Walter 928734d993 safe_create_leading_directories: fix race that could give a false negative
If two processes are racing to create the same directory tree, they
will both see that the directory doesn't exist, both try to mkdir(),
and one of them will fail.  This is okay, as we only care that the
directory gets created.  So, we add a check for EEXIST from mkdir,
and continue when the directory exists, taking the same codepath as
the case where the earlier stat() succeeds and finds a directory.

Signed-off-by: Steven Walter <stevenrwalter@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-26 21:07:42 -07:00
Thomas Rast 790d96c023 sha1_file: remove recursion in packed_object_info
packed_object_info() and packed_delta_info() were mutually recursive.
The former would handle ordinary types and defer deltas to the latter;
the latter would use the former to resolve the delta base.

This arrangement, however, leads to trouble with threaded index-pack
and long delta chains on platforms where thread stacks are small, as
happened on OS X (512kB thread stacks by default) with the chromium
repo.

The task of the two functions is not all that hard to describe without
any recursion, however.  It proceeds in three steps:

- determine the representation type and size, based on the outermost
  object (delta or not)

- follow through the delta chain, if any

- determine the object type from what is found at the end of the delta
  chain

The only complication stems from the error recovery.  If parsing fails
at any step, we want to mark that object (within the pack) as bad and
try getting the corresponding SHA1 from elsewhere.  If that also
fails, we want to repeat this process back up the delta chain until we
find a reasonable solution or conclude that there is no way to
reconstruct the object.  (This is conveniently checked by t5303.)

To achieve that within the pack, we keep track of the entire delta
chain in a stack.  When things go sour, we process that stack from the
top, marking entries as bad and attempting to re-resolve by sha1.  To
avoid excessive malloc(), the stack starts out with a small
stack-allocated array.  The choice of 64 is based on the default of
pack.depth, which is 50, in the hope that it covers "most" delta
chains without any need for malloc().

It's much harder to make the actual re-resolving by sha1 nonrecursive,
so we skip that.  If you can't afford *that* recursion, your
corruption problems are more serious than your stack size problems.

Reported-by: Stefan Zager <szager@google.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-25 15:48:18 -07:00
Nguyễn Thái Ngọc Duy 543c5caa6c count-objects: report garbage files in pack directory too
prepare_packed_git_one() is modified to allow count-objects to hook a
report function to so we don't need to duplicate the pack searching
logic in count-objects.c. When report_pack_garbage is NULL, the
overhead is insignificant.

The garbage is reported with warning() instead of error() in packed
garbage case because it's not an error to have garbage. Loose garbage
is still reported as errors and will be converted to warnings later.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-15 08:13:13 -08:00
Nguyễn Thái Ngọc Duy d90906a902 sha1_file: reorder code in prepare_packed_git_one()
The current loop does

	while (...) {
		if (it is not an .idx file)
			continue;
		process .idx file;
	}

and is reordered to

	while (...) {
		if (it is an .idx file) {
			process .idx file;
		}
	}

This makes it easier to add new extension file processing.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-13 07:42:05 -08:00
Michael Haggerty c595016402 link_alt_odb_entries(): take (char *, len) rather than two pointers
Change link_alt_odb_entries() to take the length of the "alt"
parameter rather than a pointer to the end of the "alt" string.  This
is the more common calling convention and simplifies the code a tiny
bit.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
2012-11-08 12:06:53 -05:00
Michael Haggerty 6eac50d827 link_alt_odb_entries(): use string_list_split_in_place()
Change link_alt_odb_entry() to take a NUL-terminated string instead of
(char *, len).  Use string_list_split_in_place() rather than inline
code in link_alt_odb_entries().

This approach saves some code and also avoids the (probably harmless)
error of passing a non-NUL-terminated string to is_absolute_path().

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
2012-11-08 12:06:53 -05:00
Joachim Schmitz a0788266d3 sha1_file.c: introduce get_max_fd_limit() helper
Not all platforms have getrlimit(), but there are other ways to see
the maximum number of files that a process can have open.  If
getrlimit() is unavailable, fall back to sysconf(_SC_OPEN_MAX) if
available, and use OPEN_MAX from <limits.h>.

Signed-off-by: Joachim Schmitz <jojo@schmitz-digital.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-24 09:46:01 -07:00
Junio C Hamano fbea95ce10 Merge branch 'hv/link-alt-odb-entry'
The code to avoid mistaken attempt to add the object directory
itself as its own alternate could read beyond end of a string while
comparison.

* hv/link-alt-odb-entry:
  link_alt_odb_entry: fix read over array bounds reported by valgrind
2012-07-30 12:55:01 -07:00
Heiko Voigt cb2912c324 link_alt_odb_entry: fix read over array bounds reported by valgrind
pfxlen can be longer than the path in objdir when relative_base
contains the path to gits object directory.  Here we are interested
in checking if ent->base[] (the part that corresponds to .git/objects)
is the same string as objdir, and the code NUL-terminated ent->base[]
to

	LEADING PATH\0XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\0

in preparation for these "duplicate check" step (before we return
from the function, the first NUL is turned into '/' so that we can
fill XX when probing for loose objects).  All we need to do is to
compare the string with the path to our object directory.

Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-29 18:02:51 -07:00
Junio C Hamano 4809ff858b Merge branch 'hv/submodule-alt-odb'
When peeking into object stores of submodules, the code forgot that they
might borrow objects from alternate object stores on their own.

By Heiko Voigt
* hv/submodule-alt-odb:
  teach add_submodule_odb() to look for alternates
2012-05-23 13:35:06 -07:00
Heiko Voigt 5e73633dbf teach add_submodule_odb() to look for alternates
Since we allow to link other object databases when loading a submodules
database we should also load possible alternates.

Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-14 11:56:42 -07:00
Pete Wyckoff 5eaeda70de remove blank filename in error message
When write_loose_object() finds that it is unable to
create a temporary file, it complains, for instance:

    unable to create temporary sha1 filename : Too many open files

That extra space was supposed to be the name of the file,
and will be an empty string if the git_mkstemps_mode() fails.

The name of the temporary file is unimportant; delete it.

Signed-off-by: Pete Wyckoff <pw@padd.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-30 15:45:54 -07:00
Pete Wyckoff 82247e9bd5 remove superfluous newlines in error messages
The error handling routines add a newline.  Remove
the duplicate ones in error messages.

Signed-off-by: Pete Wyckoff <pw@padd.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-30 15:45:51 -07:00
Nguyễn Thái Ngọc Duy 090ea12671 parse_object: avoid putting whole blob in core
Traditionally, all the callers of check_sha1_signature() first
called read_sha1_file() to prepare the whole object data in core,
and called this function.  The function is used to revalidate what
we read from the object database actually matches the object name we
used to ask for the data from the object database.

Update the API to allow callers to pass NULL as the object data, and
have the function read and hash the object data using streaming API
to recompute the object name, without having to hold everything in
core at the same time.  This is most useful in parse_object() that
parses a blob object, because this caller does not have to keep the
actual blob data around in memory after a "struct blob" is returned.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-07 09:07:38 -08:00
Junio C Hamano a09a0c2709 Merge branch 'jk/maint-avoid-streaming-filtered-contents' into maint
* jk/maint-avoid-streaming-filtered-contents:
  do not stream large files to pack when filters are in use
  teach dry-run convert_to_git not to require a src buffer
  teach convert_to_git a "dry run" mode
2012-03-04 22:16:40 -08:00
Junio C Hamano 31e3d834b3 Merge branch 'jk/maint-avoid-streaming-filtered-contents'
* jk/maint-avoid-streaming-filtered-contents:
  do not stream large files to pack when filters are in use
  teach dry-run convert_to_git not to require a src buffer
  teach convert_to_git a "dry run" mode
2012-02-26 23:05:38 -08:00
Jeff King 4f22b1015d do not stream large files to pack when filters are in use
Because git's object format requires us to specify the
number of bytes in the object in its header, we must know
the size before streaming a blob into the object database.
This is not a problem when adding a regular file, as we can
get the size from stat(). However, when filters are in use
(such as autocrlf, or the ident, filter, or eol
gitattributes), we have no idea what the ultimate size will
be.

The current code just punts on the whole issue and ignores
filter configuration entirely for files larger than
core.bigfilethreshold. This can generate confusing results
if you use filters for large binary files, as the filter
will suddenly stop working as the file goes over a certain
size.  Rather than try to handle unknown input sizes with
streaming, this patch just turns off the streaming
optimization when filters are in use.

This has a slight performance regression in a very specific
case: if you have autocrlf on, but no gitattributes, a large
binary file will avoid the streaming code path because we
don't know beforehand whether it will need conversion or
not. But if you are handling large binary files, you should
be marking them as such via attributes (or at least not
using autocrlf, and instead marking your text files as
such). And the flip side is that if you have a large
_non_-binary file, there is a correctness improvement;
before we did not apply the conversion at all.

The first half of the new t1051 script covers these failures
on input. The second half tests the matching output code
paths. These already work correctly, and do not need any
adjustment.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-24 14:18:20 -08:00
Junio C Hamano f3ccea8dd4 Merge branch 'nd/find-pack-entry-recent-cache-invalidation' into maint
* nd/find-pack-entry-recent-cache-invalidation:
  find_pack_entry(): do not keep packed_git pointer locally
  sha1_file.c: move the core logic of find_pack_entry() into fill_pack_entry()
2012-02-21 14:56:36 -08:00
Junio C Hamano c6a4e3f7a7 Merge branch 'mm/empty-loose-error-message' into maint
* mm/empty-loose-error-message:
  fsck: give accurate error message on empty loose object files
2012-02-16 14:00:25 -08:00
Junio C Hamano dd5253b4bd Merge branch 'nd/find-pack-entry-recent-cache-invalidation'
* nd/find-pack-entry-recent-cache-invalidation:
  find_pack_entry(): do not keep packed_git pointer locally
  sha1_file.c: move the core logic of find_pack_entry() into fill_pack_entry()
2012-02-12 22:43:03 -08:00
Junio C Hamano 8c18a6f3fa Merge branch 'mm/empty-loose-error-message'
* mm/empty-loose-error-message:
  fsck: give accurate error message on empty loose object files
2012-02-12 22:42:02 -08:00
Matthieu Moy 33e42de0d2 fsck: give accurate error message on empty loose object files
Since 3ba7a06552 (A loose object is not corrupt if it
cannot be read due to EMFILE), "git fsck" on a repository with an empty
loose object file complains with the error message

  fatal: failed to read object <sha1>: Invalid argument

This comes from a failure of mmap on this empty file, which sets errno to
EINVAL. Instead of calling xmmap on empty file, we display a clean error
message ourselves, and return a NULL pointer. The new message is

  error: object file .git/objects/09/<rest-of-sha1> is empty
  fatal: loose object <sha1> (stored in .git/objects/09/<rest-of-sha1>) is corrupt

The second line was already there before the regression in 3ba7a06552,
and the first is an additional message, that should help diagnosing the
problem for the user.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-06 11:05:36 -08:00
Nguyễn Thái Ngọc Duy c01f51cc75 find_pack_entry(): do not keep packed_git pointer locally
Commit f7c22cc (always start looking up objects in the last used pack
first - 2007-05-30) introduce a static packed_git* pointer as an
optimization.  The kept pointer however may become invalid if
free_pack_by_name() happens to free that particular pack.

Current code base does not access packs after calling
free_pack_by_name() so it should not be a problem. Anyway, move the
pointer out so that free_pack_by_name() can reset it to avoid running
into troubles in future.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-01 14:12:42 -08:00
Nguyễn Thái Ngọc Duy 95099731bf sha1_file.c: move the core logic of find_pack_entry() into fill_pack_entry()
The new helper function implements the logic to find the offset for the
object in one pack and fill a pack_entry structure. The next patch will
restructure the loop and will call the helper from two places.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-01 14:12:41 -08:00
Ævar Arnfjörð Bjarmason ab1900a36e Appease Sun Studio by renaming "tmpfile"
On Solaris the system headers define the "tmpfile" name, which'll
cause Git compiled with Sun Studio 12 Update 1 to whine about us
redefining the name:

    "pack-write.c", line 76: warning: name redefined by pragma redefine_extname declared static: tmpfile     (E_PRAGMA_REDEFINE_STATIC)
    "sha1_file.c", line 2455: warning: name redefined by pragma redefine_extname declared static: tmpfile    (E_PRAGMA_REDEFINE_STATIC)
    "fast-import.c", line 858: warning: name redefined by pragma redefine_extname declared static: tmpfile   (E_PRAGMA_REDEFINE_STATIC)
    "builtin/index-pack.c", line 175: warning: name redefined by pragma redefine_extname declared static: tmpfile    (E_PRAGMA_REDEFINE_STATIC)

Just renaming the "tmpfile" variable to "tmp_file" in the relevant
places is the easiest way to fix this.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-21 10:21:04 -08:00
Junio C Hamano 48b303675a Merge branch 'jc/stream-to-pack'
* jc/stream-to-pack:
  bulk-checkin: replace fast-import based implementation
  csum-file: introduce sha1file_checkpoint
  finish_tmp_packfile(): a helper function
  create_tmp_packfile(): a helper function
  write_pack_header(): a helper function

Conflicts:
	pack.h
2011-12-16 22:33:40 -08:00
Junio C Hamano df6246ed78 Merge branch 'nd/misc-cleanups' into maint
* nd/misc-cleanups:
  unpack_object_header_buffer(): clear the size field upon error
  tree_entry_interesting: make use of local pointer "item"
  tree_entry_interesting(): give meaningful names to return values
  read_directory_recursive: reduce one indentation level
  get_tree_entry(): do not call find_tree_entry() on an empty tree
  tree-walk.c: do not leak internal structure in tree_entry_len()
2011-12-13 22:02:51 -08:00
Junio C Hamano 62cdb6b23a Merge branch 'nd/misc-cleanups'
* nd/misc-cleanups:
  unpack_object_header_buffer(): clear the size field upon error
  tree_entry_interesting: make use of local pointer "item"
  tree_entry_interesting(): give meaningful names to return values
  read_directory_recursive: reduce one indentation level
  get_tree_entry(): do not call find_tree_entry() on an empty tree
  tree-walk.c: do not leak internal structure in tree_entry_len()
2011-12-05 15:10:20 -08:00
Junio C Hamano 568508e765 bulk-checkin: replace fast-import based implementation
This extends the earlier approach to stream a large file directly from the
filesystem to its own packfile, and allows "git add" to send large files
directly into a single pack. Older code used to spawn fast-import, but the
new bulk-checkin API replaces it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-01 11:46:09 -08:00
Ramkumar Ramachandra 5e12e78e52 sha1_file: don't mix enum with int
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-15 16:09:20 -08:00
Junio C Hamano ea4f9685cb unpack_object_header_buffer(): clear the size field upon error
The callers do not use the returned size when the function says
it did not use any bytes and sets the type to OBJ_BAD, so this
should not matter in practice, but it is a good code hygiene
anyway.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-27 11:42:57 -07:00
Junio C Hamano 2070950633 Merge branch 'jk/maint-pack-objects-compete-with-delete'
* jk/maint-pack-objects-compete-with-delete:
  downgrade "packfile cannot be accessed" errors to warnings
  pack-objects: protect against disappearing packs
2011-10-21 16:04:33 -07:00
Jeff King 58a6a9cc43 downgrade "packfile cannot be accessed" errors to warnings
These can happen if another process simultaneously prunes a
pack. But that is not usually an error condition, because a
properly-running prune should have repacked the object into
a new pack. So we will notice that the pack has disappeared
unexpectedly, print a message, try other packs (possibly
after re-scanning the list of packs), and find it in the new
pack.

Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-14 11:43:09 -07:00
Jeff King 4c08018204 pack-objects: protect against disappearing packs
It's possible that while pack-objects is running, a
simultaneously running prune process might delete a pack
that we are interested in. Because we load the pack indices
early on, we know that the pack contains our item, but by
the time we try to open and map it, it is gone.

Since c715f78, we already protect against this in the normal
object access code path, but pack-objects accesses the packs
at a lower level.  In the normal access path, we call
find_pack_entry, which will call find_pack_entry_one on each
pack index, which does the actual lookup. If it gets a hit,
we will actually open and verify the validity of the
matching packfile (using c715f78's is_pack_valid). If we
can't open it, we'll issue a warning and pretend that we
didn't find it, causing us to go on to the next pack (or on
to loose objects).

Furthermore, we will cache the descriptor to the opened
packfile. Which means that later, when we actually try to
access the object, we are likely to still have that packfile
opened, and won't care if it has been unlinked from the
filesystem.

Notice the "likely" above. If there is another pack access
in the interim, and we run out of descriptors, we could
close the pack. And then a later attempt to access the
closed pack could fail (we'll try to re-open it, of course,
but it may have been deleted). In practice, this doesn't
happen because we tend to look up items and then access them
immediately.

Pack-objects does not follow this code path. Instead, it
accesses the packs at a much lower level, using
find_pack_entry_one directly. This means we skip the
is_pack_valid check, and may end up with the name of a
packfile, but no open descriptor.

We can add the same is_pack_valid check here. Unfortunately,
the access patterns of pack-objects are not quite as nice
for keeping lookup and object access together. We look up
each object as we find out about it, and the only later when
writing the packfile do we necessarily access it. Which
means that the opened packfile may be closed in the interim.

In practice, however, adding this check still has value, for
three reasons.

  1. If you have a reasonable number of packs and/or a
     reasonable file descriptor limit, you can keep all of
     your packs open simultaneously. If this is the case,
     then the race is impossible to trigger.

  2. Even if you can't keep all packs open at once, you
     may end up keeping the deleted one open (i.e., you may
     get lucky).

  3. The race window is shortened. You may notice early that
     the pack is gone, and not try to access it. Triggering
     the problem without this check means deleting the pack
     any time after we read the list of index files, but
     before we access the looked-up objects.  Triggering it
     with this check means deleting the pack means deleting
     the pack after we do a lookup (and successfully access
     the packfile), but before we access the object. Which
     is a smaller window.

Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-14 11:42:37 -07:00
Junio C Hamano e99f8c6dcf Merge branch 'wh/normalize-alt-odb-path'
* wh/normalize-alt-odb-path:
  sha1_file: normalize alt_odb path before comparing and storing
2011-10-05 12:36:22 -07:00
Hui Wang 5bdf0a8468 sha1_file: normalize alt_odb path before comparing and storing
When it needs to compare and add an alt object path to the
alt_odb_list, we normalize this path first since comparing normalized
path is easy to get correct result.

Use strbuf to replace some string operations, since it is cleaner and
safer.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Hui Wang <Hui.Wang@windriver.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-09-07 11:47:43 -07:00
Junio C Hamano 2478bd8318 Merge branch 'jc/maint-clone-alternates'
* jc/maint-clone-alternates:
  clone: clone from a repository with relative alternates
  clone: allow more than one --reference

Conflicts:
	builtin/clone.c
2011-08-28 21:19:21 -07:00
Junio C Hamano 6fcb384869 Merge branch 'rt/zlib-smaller-window'
* rt/zlib-smaller-window:
  test: consolidate definition of $LF
  Tolerate zlib deflation with window size < 32Kb
2011-08-23 15:40:33 -07:00
Junio C Hamano e6baf4a1ae clone: clone from a repository with relative alternates
Cloning from a local repository blindly copies or hardlinks all the files
under objects/ hierarchy. This results in two issues:

 - If the repository cloned has an "objects/info/alternates" file, and the
   command line of clone specifies --reference, the ones specified on the
   command line get overwritten by the copy from the original repository.

 - An entry in a "objects/info/alternates" file can specify the object
   stores it borrows objects from as a path relative to the "objects/"
   directory. When cloning a repository with such an alternates file, if
   the new repository is not sitting next to the original repository, such
   relative paths needs to be adjusted so that they can be used in the new
   repository.

This updates add_to_alternates_file() to take the path to the alternate
object store, including the "/objects" part at the end (earlier, it was
taking the path to $GIT_DIR and was adding "/objects" itself), as it is
technically possible to specify in objects/info/alternates file the path
of a directory whose name does not end with "/objects".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-23 09:56:14 -07:00
Roberto Tyley 7f684a2aff Tolerate zlib deflation with window size < 32Kb
Git currently reports loose objects as 'corrupt' if they've been
deflated using a window size less than 32Kb, because the
experimental_loose_object() function doesn't recognise the header
byte as a zlib header. This patch makes the function tolerant of
all valid window sizes (15-bit to 8-bit) - but doesn't sacrifice
it's accuracy in distingushing the standard loose-object format
from the experimental (now abandoned) format.

On memory constrained systems zlib may use a much smaller window
size - working on Agit, I found that Android uses a 4KB window;
giving a header byte of 0x48, not 0x78. Consequently all loose
objects generated appear 'corrupt', which is why Agit is a read-only
Git client at this time - I don't want my client to generate Git
repos that other clients treat as broken :(

This patch makes Git tolerant of different deflate settings - it
might appear that it changes experimental_loose_object() to the point
where it could incorrectly identify the experimental format as the
standard one, but the two criteria (bitmask & checksum) can only
give a false result for an experimental object where both of the
following are true:

1) object size is exactly 8 bytes when uncompressed (bitmask)
2) [single-byte in-pack git type&size header] * 256
   + [1st byte of the following zlib header] % 31 = 0 (checksum)

As it happens, for all possible combinations of valid object type
(1-4) and window bits (0-7), the only time when the checksum will be
divisible by 31 is for 0x1838 - ie object type *1*, a Commit - which,
due the fields all Commit objects must contain, could never be as
small as 8 bytes in size.

Given this, the combination of the two criteria (bitmask & checksum)
always correctly determines the buffer format, and is more tolerant
than the previous version.

The alternative to this patch is simply removing support for the
experimental format, which I am also totally cool with.

References:

Android uses a 4KB window for deflation:
http://android.git.kernel.org/?p=platform/libcore.git;a=blob;f=luni/src/main/native/java_util_zip_Deflater.cpp;h=c0b2feff196e63a7b85d97cf9ae5bb2583409c28;hb=refs/heads/gingerbread#l53

Code snippet searching for false positives with the zlib checksum:
https://gist.github.com/1118177

Signed-off-by: Roberto Tyley <roberto.tyley@guardian.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-11 13:02:47 -07:00
Junio C Hamano 96790ca029 Merge branch 'jc/pack-order-tweak'
* jc/pack-order-tweak:
  pack-objects: optimize "recency order"
  core: log offset pack data accesses happened
2011-08-05 14:54:57 -07:00
Junio C Hamano d48929e1c3 Merge branch 'jc/legacy-loose-object' into maint
* jc/legacy-loose-object:
  sha1_file.c: "legacy" is really the current format
2011-08-01 14:43:58 -07:00
Junio C Hamano d907bf8ef3 Merge branch 'jc/index-pack'
* jc/index-pack:
  verify-pack: use index-pack --verify
  index-pack: show histogram when emulating "verify-pack -v"
  index-pack: start learning to emulate "verify-pack -v"
  index-pack: a miniscule refactor
  index-pack --verify: read anomalous offsets from v2 idx file
  write_idx_file: need_large_offset() helper function
  index-pack: --verify
  write_idx_file: introduce a struct to hold idx customization options
  index-pack: group the delta-base array entries also by type

Conflicts:
	builtin/verify-pack.c
	cache.h
	sha1_file.c
2011-07-19 09:54:51 -07:00
Junio C Hamano eb4f4076aa Merge branch 'jc/zlib-wrap'
* jc/zlib-wrap:
  zlib: allow feeding more than 4GB in one go
  zlib: zlib can only process 4GB at a time
  zlib: wrap deflateBound() too
  zlib: wrap deflate side of the API
  zlib: wrap inflateInit2 used to accept only for gzip format
  zlib: wrap remaining calls to direct inflate/inflateEnd
  zlib wrapper: refactor error message formatter

Conflicts:
	sha1_file.c
2011-07-19 09:33:04 -07:00
Junio C Hamano 5f2e448370 Merge branch 'jc/legacy-loose-object'
* jc/legacy-loose-object:
  sha1_file.c: "legacy" is really the current format
2011-07-13 14:31:34 -07:00
Junio C Hamano 5f44324d88 core: log offset pack data accesses happened
In a workload other than "git log" (without pathspec nor any option that
causes us to inspect trees and blobs), the recency pack order is said to
cause the access jump around quite a bit. Add a hook to allow us observe
how bad it is.

"git config core.logpackaccess /var/tmp/pal.txt" will give you the log
in the specified file.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-06 19:09:29 -07:00
Junio C Hamano ef49a7a012 zlib: zlib can only process 4GB at a time
The size of objects we read from the repository and data we try to put
into the repository are represented in "unsigned long", so that on larger
architectures we can handle objects that weigh more than 4GB.

But the interface defined in zlib.h to communicate with inflate/deflate
limits avail_in (how many bytes of input are we calling zlib with) and
avail_out (how many bytes of output from zlib are we ready to accept)
fields effectively to 4GB by defining their type to be uInt.

In many places in our code, we allocate a large buffer (e.g. mmap'ing a
large loose object file) and tell zlib its size by assigning the size to
avail_in field of the stream, but that will truncate the high octets of
the real size. The worst part of this story is that we often pass around
z_stream (the state object used by zlib) to keep track of the number of
used bytes in input/output buffer by inspecting these two fields, which
practically limits our callchain to the same 4GB limit.

Wrap z_stream in another structure git_zstream that can express avail_in
and avail_out in unsigned long. For now, just die() when the caller gives
a size that cannot be given to a single zlib call. In later patches in the
series, we would make git_inflate() and git_deflate() internally loop to
give callers an illusion that our "improved" version of zlib interface can
operate on a buffer larger than 4GB in one go.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10 11:52:15 -07:00
Junio C Hamano 55bb5c9147 zlib: wrap deflate side of the API
Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use
of deflateInit2 in remote-curl.c to tell the library to use gzip header
and trailer in git_deflate_init_gzip().

There is only one caller that cares about the status from deflateEnd().
Introduce git_deflate_end_gently() to let that sole caller retrieve the
status and act on it (i.e. die) for now, but we would probably want to
make inflate_end/deflate_end die when they ran out of memory and get
rid of the _gently() kind.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10 11:10:29 -07:00
Junio C Hamano cc5c54e78b sha1_file.c: "legacy" is really the current format
Every time I look at the read-loose-object codepath, legacy_loose_object()
function makes my brain go through mental contortion. When we were playing
with the experimental loose object format, it may have made sense to call
the traditional format "legacy", in the hope that the experimental one
will some day replace it to become official, but it never happened.

This renames the function (and negates its return value) to detect if we
are looking at the experimental format, and move the code around in its
caller which used to do "if we are looing at legacy, do this special case,
otherwise the normal case is this". The codepath to read from the loose
objects in experimental format is the "unlikely" case.

Someday after Git 2.0, we should drop the support of this format.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-08 16:39:33 -07:00
Junio C Hamano 3de89c9d42 verify-pack: use index-pack --verify
This finally gets rid of the inefficient verify-pack implementation that
walks objects in the packfile in their object name order and replaces it
with a call to index-pack --verify. As a side effect, it also removes
packed_object_info_detail() API which is rather expensive.

As this changes the way errors are reported (verify-pack used to rely on
the usual runtime error detection routine unpack_entry() to diagnose the
CRC errors in an entry in the *.idx file; index-pack --verify checks the
whole *.idx file in one go), update a test that expected the string "CRC"
to appear in the error message.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-05 22:45:38 -07:00
Jim Meyering 23c7df6bdd sha1_file: use the correct type (ssize_t, not size_t) for read-style function
Using an unsigned type, we would fail to detect a read error and then
proceed to try to write (size_t)-1 bytes.

Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-26 11:25:59 -07:00
Junio C Hamano 5cfe4256d9 Merge branch 'jc/bigfile'
* jc/bigfile:
  Bigfile: teach "git add" to send a large file straight to a pack
  index_fd(): split into two helper functions
  index_fd(): turn write_object and format_check arguments into one flag
2011-05-25 16:23:26 -07:00
Junio C Hamano f0270efd46 sha1_file.c: expose helpers to read loose objects
Make map_sha1_file(), parse_sha1_header() and unpack_sha1_header()
available to the streaming read API by exporting them via cache.h header
file.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 23:16:53 -07:00
Junio C Hamano f8c8abc5b7 unpack_object_header(): make it public
This function is used to read and skip over the per-object header
in a packfile.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 18:38:54 -07:00
Junio C Hamano 5266d369b2 sha1_object_info_extended(): hint about objects in delta-base cache
An object found in the delta-base cache is not guaranteed to
stay there, but we know it came from a pack and it is likely
to give us a quick access if we read_sha1_file() it right now,
which is a piece of useful information.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 18:38:50 -07:00
Junio C Hamano 61d7503da1 Merge branch 'jc/replacing'
* jc/replacing:
  read_sha1_file(): allow selective bypassing of replacement mechanism
  inline lookup_replace_object() calls
  read_sha1_file(): get rid of read_sha1_file_repl() madness
  t6050: make sure we test not just commit replacement
  Declare lookup_replace_object() in cache.h, not in commit.h

Conflicts:
	environment.c
2011-05-19 20:37:21 -07:00
Junio C Hamano 9a49059022 sha1_object_info_extended(): expose a bit more info
The original interface for sha1_object_info() takes an object name and
gives back a type and its size (the latter is given only when it was
asked).  The new interface wraps its implementation and exposes a bit
more pieces of information that the interface used to discard, namely:

 - where the object is stored (loose? cached? packed?)
 - if packed, where in which packfile?

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * In the earlier round, this used u.pack.delta to record the length of
   the delta chain, but the caller is not necessarily interested in the
   length of the delta chain per-se, but may only want to know if it is a
   delta against another object or is stored as a deflated data. Calling
   packed_object_info_detail() involves walking the reverse index chain to
   compute the store size of the object and is unnecessarily expensive.

   We could resurrect the code if a new caller wants to know, but I doubt
   it.
2011-05-19 14:22:47 -07:00
Junio C Hamano b9a62cbeb9 packed_object_info_detail(): do not return a string
Instead return an integer that can be given to typename() if
the caller wants a string, just like everybody else does.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-16 22:13:34 -07:00
Junio C Hamano 02071b27f1 Merge branches 'jc/convert', 'jc/bigfile' and 'jc/replacing' into jc/streaming
* jc/convert:
  convert: make it harder to screw up adding a conversion attribute
  convert: make it safer to add conversion attributes
  convert: give saner names to crlf/eol variables, types and functions
  convert: rename the "eol" global variable to "core_eol"

* jc/bigfile:
  Bigfile: teach "git add" to send a large file straight to a pack
  index_fd(): split into two helper functions
  index_fd(): turn write_object and format_check arguments into one flag

* jc/replacing:
  read_sha1_file(): allow selective bypassing of replacement mechanism
  inline lookup_replace_object() calls
  read_sha1_file(): get rid of read_sha1_file_repl() madness
  t6050: make sure we test not just commit replacement
  Declare lookup_replace_object() in cache.h, not in commit.h
2011-05-15 16:30:13 -07:00
Junio C Hamano f4e516834e git_open_noatime(): drop unused parameter
Since commit c793430 (Limit file descriptors used by packs, 2011-02-28),
the extra parameter added in f2e872aa (Work around EMFILE when there are
too many pack files, 2010-11-01) is not used anymore.

Remove it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
2011-05-15 15:24:52 -07:00
Junio C Hamano ccf5ace0dc sha1_file: typofix
The number zero is spelled "zero", not "zer0".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15 15:24:36 -07:00
Junio C Hamano 5bf29b9500 read_sha1_file(): allow selective bypassing of replacement mechanism
The way "object replacement" mechanism was tucked to the read_sha1_file()
interface was suboptimal in a couple of ways:

 - Callers that want it to die with useful diagnosis upon seeing a corrupt
   object does not have a way to say that they do not want any object
   replacement.

 - Callers who do not want it to die but want to handle the errors
   themselves are told to arrange to call read_object(), but the function
   does not use the replacement mechanism, and also it is a file scope
   static function that not many callers can call to begin with.

This adds a read_sha1_file_extended() that takes a set of flags; the
callers of read_sha1_file() passes a flag READ_SHA1_FILE_REPLACE to ask
for object replacement mechanism to kick in.

Later, we could add another flag bit to tell the function to return an
error instead of dying and then remove the misguided "call read_object()
yourself".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15 15:23:34 -07:00
Junio C Hamano 4bbf5a2615 read_sha1_file(): get rid of read_sha1_file_repl() madness
Most callers want to silently get a replacement object, and they do not
care what the real name of the replacement object is.  Worse yet, no sane
interface to return the underlying object without replacement is provided.

Remove the function and make only the few callers that want the name of
the replacement object find it themselves.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15 15:23:33 -07:00
Junio C Hamano 4dd1fbc7b1 Bigfile: teach "git add" to send a large file straight to a pack
When adding a new content to the repository, we have always slurped
the blob in its entirety in-core first, and computed the object name
and compressed it into a loose object file.  Handling large binary
files (e.g.  video and audio asset for games) has been problematic
because of this design.

At the middle level of "git add" callchain is an internal API
index_fd() that takes an open file descriptor to read from the
working tree file being added with its size. Teach it to call out to
fast-import when adding a large blob.

The write-out codepath in entry.c::write_entry() should be taught to
stream, instead of reading everything in core. This should not be so
hard to implement, especially if we limit ourselves only to loose
object files and non-delta representation in packfiles.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-13 16:11:18 -07:00
Junio C Hamano 7b41e1e15b index_fd(): split into two helper functions
Split out the case where we do not know the size of the input (hence we
read everything into a strbuf before doing anything) to index_pipe(), and
the other case where we mmap or read the whole data to index_bulk().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09 11:58:19 -07:00
Junio C Hamano c4ce46fc7a index_fd(): turn write_object and format_check arguments into one flag
The "format_check" parameter tucked after the existing parameters is too
ugly an afterthought to live in any reasonable API.

Combine it with the other boolean parameter "write_object" into a single
"flags" parameter.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09 11:58:19 -07:00
Jim Meyering 0353a0c4ec remove doubled words, e.g., s/to to/to/, and fix related typos
I found that some doubled words had snuck back into projects from which
I'd already removed them, so now there's a "syntax-check" makefile rule in
gnulib to help prevent recurrence.

Running the command below spotted a few in git, too:

  git ls-files | xargs perl -0777 -n \
    -e 'while (/\b(then?|[iao]n|i[fst]|but|f?or|at|and|[dt])\s+\1\b/gims)' \
    -e '{$n=($` =~ tr/\n/\n/ + 1); ($v=$&)=~s/\n/\\n/g;' \
    -e 'print "$ARGV:$n:$v\n"}'

Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-13 11:59:11 -07:00
Junio C Hamano ad7bb2f68c Merge branch 'jc/maint-rerere-in-workdir'
* jc/maint-rerere-in-workdir:
  rerere: make sure it works even in a workdir attached to a young repository
2011-03-26 20:13:16 -07:00
Junio C Hamano 90a6464b4a rerere: make sure it works even in a workdir attached to a young repository
The git-new-workdir script in contrib/ makes a new work tree by sharing
many subdirectories of the .git directory with the original repository.
When rerere.enabled is set in the original repository, but the user has
not encountered any conflicts yet, the original repository may not yet
have .git/rr-cache directory.

When rerere wants to run in a new work tree created from such a young
original repository, it fails to mkdir(2) .git/rr-cache that is a symlink
to a yet-to-be-created directory.

There are three possible approaches to this:

 - A naive solution is not to create a symlink in the git-new-workdir
   script to a directory the original does not have (yet).  This is not a
   solution, as we tend to lazily create subdirectories of .git/, and
   having rerere.enabled configuration set is a strong indication that the
   user _wants_ to have this lazy creation to happen;

 - We could always create .git/rr-cache upon repository creation.  This is
   tempting but will not help people with existing repositories.

 - Detect this case by seeing that mkdir(2) failed with EEXIST, checking
   that the path is a symlink, and try running mkdir(2) on the link
   target.

This patch solves the issue by doing the third one.

Strictly speaking, this is incomplete.  It does not attempt to handle
relative symbolic link that points into the original repository, but this
is good enough to help people who use contrib/workdir/git-new-workdir
script.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-23 16:05:44 -07:00
Junio C Hamano 3ed8868474 Merge branch 'jn/maint-c99-format'
* jn/maint-c99-format:
  unbreak and eliminate NO_C99_FORMAT
  mktag: avoid %td in format string
2011-03-23 14:55:46 -07:00
Jonathan Nieder 28bd70d811 unbreak and eliminate NO_C99_FORMAT
In the spirit of v1.5.0.2~21 (Check for PRIuMAX rather than
NO_C99_FORMAT in fast-import.c, 2007-02-20), use PRIuMAX from
git-compat-util.h on all platforms instead of C99-specific formats
like %zu with dangerous fallbacks to %u or %lu.

So now C99-challenged platforms can build git without provoking
warnings or errors from printf, even if pointers do not have the same
size as an int or long.

The need for a fallback PRIuMAX is detected in git-compat-util.h with
"#ifndef PRIuMAX".  So while at it, simplify the Makefile and configure
script by eliminating the NO_C99_FORMAT knob altogether.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-17 15:30:49 -07:00
Junio C Hamano 674ef90904 Merge branch 'sp/maint-fd-limit'
* sp/maint-fd-limit:
  sha1_file.c: Don't retain open fds on small packs
  mingw: add minimum getrlimit() compatibility stub
  Limit file descriptors used by packs
2011-03-15 14:22:23 -07:00
Shawn O. Pearce d131b7afea sha1_file.c: Don't retain open fds on small packs
If a pack file is small enough that its entire contents fits within
one mmap window, mmap the file and then immediately close its file
descriptor.  This reduces the number of file descriptors that are
needed to read from repositories with many tiny pack files, such
as one that has received 1000 pushes (and created 1000 small pack
files) since its last repack.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-02 11:25:30 -08:00
Shawn O. Pearce c7934306d1 Limit file descriptors used by packs
Rather than using 'errno == EMFILE' after a failed open() call
to indicate the process is out of file descriptors and an LRU
pack window should be closed, place a hard upper limit on the
number of open packs based on the actual rlimit of the process.

By using a hard upper limit that is below the rlimit of the current
process it is not necessary to check for EMFILE on every single
fd-allocating system call.  Instead reserving 25 file descriptors
makes it safe to assume the system call won't fail due to being over
the filedescriptor limit.  Here 25 is chosen as a WAG, but considers
3 for stdin/stdout/stderr, and at least a few for other Git code
to operate on temporary files.  An additional 20 is reserved as it
is not known what the C library needs to perform other services on
Git's behalf, such as nsswitch or name resolution.

This fixes a case where running `git gc --auto` in a repository
with more than 1024 packs (but an rlimit of 1024 open fds) fails
due to the temporary output file not being able to allocate a
file descriptor.  The output file is opened by pack-objects after
object enumeration and delta compression are done, both of which
have already opened all of the packs and fully populated the file
descriptor table.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-28 13:08:31 -08:00
Junio C Hamano fc7ae9c156 Merge branch 'nd/hash-object-sanity'
* nd/hash-object-sanity:
  Make hash-object more robust against malformed objects

Conflicts:
	cache.h
2011-02-27 21:58:30 -08:00
Jonathan Nieder dab0d4108d correct type of EMPTY_TREE_SHA1_BIN
Functions such as hashcmp that expect a binary SHA-1 value take
parameters of type "unsigned char *" to avoid accepting a textual
SHA-1 passed by mistake.  Unfortunately, this means passing the string
literal EMPTY_TREE_SHA1_BIN requires an ugly cast.  Tweak the
definition of EMPTY_TREE_SHA1_BIN to produce a value of more
convenient type.

In the future the definition might change to

	extern const unsigned char empty_tree_sha1_bin[20];
	#define EMPTY_TREE_SHA1_BIN empty_tree_sha1_bin

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-14 10:48:06 -08:00
Nguyễn Thái Ngọc Duy c4d9986f5f sha1_object_info: examine cached_object store too
Cached object store was added in d66b37b (Add pretend_sha1_file()
interface. - 2007-02-04) as a way to temporarily inject some objects
to object store.

But only read_sha1_file() knows about this store. While it will return
an object from this store, sha1_object_info() will happily say
"object not found".

Teach sha1_object_info() about the cached store for consistency.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-07 15:05:48 -08:00
Nguyễn Thái Ngọc Duy c597ba8010 sha1_file.c: move find_cached_object up so sha1_object_info can use it
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-07 15:05:46 -08:00
Nguyễn Thái Ngọc Duy c879daa237 Make hash-object more robust against malformed objects
Commits, trees and tags have structure. Don't let users feed git
with malformed ones. Sooner or later git will die() when
encountering them.

Note that this patch does not check semantics. A tree that points
to non-existent objects is perfectly OK (and should be so, users
may choose to add commit first, then its associated tree for example).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-07 15:05:25 -08:00
Björn Steinbrink 25f3af3f9d Correctly report corrupted objects
The errno check added in commit 3ba7a06 "A loose object is not corrupt
if it cannot be read due to EMFILE" only checked for whether errno is
not ENOENT and thus incorrectly treated "no error" as an error
condition.

Because of that, it never reached the code path that would report that
the object is corrupted and instead caused funny errors like:

  fatal: failed to read object 333c4768ce595793fdab1ef3a036413e2a883853: Success

So we have to extend the check to cover the case in which the object
file was successfully read, but its contents are corrupted.

Reported-by: Will Palmer <wmpalmer@gmail.com>
Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-01-20 13:18:51 -08:00
Junio C Hamano 39f04dbaac Merge branch 'jn/thinner-wrapper'
* jn/thinner-wrapper:
  Remove pack file handling dependency from wrapper.o
  pack-objects: mark file-local variable static
  wrapper: give zlib wrappers their own translation unit
  strbuf: move strbuf_branchname to sha1_name.c
  path helpers: move git_mkstemp* to wrapper.c
  wrapper: move odb_* to environment.c
  wrapper: move xmmap() to sha1_file.c
2010-12-03 16:13:06 -08:00
Jonathan Nieder e050029385 Remove pack file handling dependency from wrapper.o
As v1.7.0-rc0~43 (slim down "git show-index", 2010-01-21) explains,
use of xmalloc() brings in a dependency on zlib, the sha1 lib, and the
rest of git's object file access machinery via try_to_free_pack_memory.
That is overkill when xmalloc is just being used as a convenience
wrapper to exit when no memory is available.

So defer setting try_to_free_pack_memory as try_to_free_routine until
the first packfile is opened in add_packed_git().

After this change, a simple program using xmalloc() and no other
functions will not pull in any code from libgit.a aside from wrapper.o
and usage.o.

Improved-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-10 11:11:07 -08:00
Jonathan Nieder 58ecbd5ede wrapper: move xmmap() to sha1_file.c
wrapper.o depends on sha1_file.o for a number of reasons.  One is
release_pack_memory().

xmmap function calls mmap, discarding unused pack windows when
necessary to relieve memory pressure.  Simple git programs using
wrapper.o as a friendly libc do not need this functionality.
So move xmmap to sha1_file.o, where release_pack_memory() is.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-10 11:03:13 -08:00
Shawn O. Pearce f2e872aa5e Work around EMFILE when there are too many pack files
When opening any files in the object database, release unused pack
windows if the open(2) syscall fails due to EMFILE (too many open
files in this process).  This allows Git to degrade gracefully on
a repository with thousands of pack files, and a commit stored in
a loose object in the middle of the history.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-03 10:21:46 -07:00
Shawn O. Pearce 4865d2b662 Use git_open_noatime when accessing pack data
This utility function avoids an unnecessary update of the access time
for a loose object file.  Just as the atime isn't useful on a loose
object, its not useful on the pack or the corresonding idx file.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-03 09:25:58 -07:00
Junio C Hamano 3ba7a06552 A loose object is not corrupt if it cannot be read due to EMFILE
"git fsck" bails out with a claim that a loose object that cannot be
read but exists on the filesystem to be corrupt, which is wrong when
read_object() failed due to e.g. EMFILE.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-03 09:24:57 -07:00
Junio C Hamano b6c4ceccb3 read_sha1_file(): report correct name of packfile with a corrupt object
Clarify the error reporting logic by moving the normal codepath (i.e. we
read the object we wanted to read correctly) up and return early.

The logic to report the name of the packfile with a corrupt object,
introduced by e8b15e6 (sha1_file: Show the the type and path to corrupt
objects, 2010-06-10), was totally bogus.  The function that knows which
bad object came from what packfile is has_packed_and_bad(); make it report
which packfile the problem was found.

"Corrupt" is already an adjective, e.g. an object is "corrupt"; we do not
have to say "corrupted object".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-03 09:24:47 -07:00
Ævar Arnfjörð Bjarmason e8b15e6156 sha1_file: Show the the type and path to corrupt objects
Change the error message that's displayed when we encounter corrupt
objects to be more specific. We now print the type (loose or packed)
of corrupted objects, along with the full path to the file in
question.

Before:

    $ git cat-file blob 909ef997367880aaf2133bafa1f1a71aa28e09df
    fatal: object 909ef997367880aaf2133bafa1f1a71aa28e09df is corrupted

After:

    $ git cat-file blob 909ef997367880aaf2133bafa1f1a71aa28e09df
    fatal: loose object 909ef997367880aaf2133bafa1f1a71aa28e09df (stored in .git/objects/90/9ef997367880aaf2133bafa1f1a71aa28e09df) is corrupted

Knowing the path helps to quickly analyze what's wrong:

    $ file .git/objects/90/9ef997367880aaf2133bafa1f1a71aa28e09df
    .git/objects/90/9ef997367880aaf2133bafa1f1a71aa28e09df: empty

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-07-14 15:35:12 -07:00
Junio C Hamano e391fdfc69 Merge branch 'jk/maint-sha1-file-name-fix'
* jk/maint-sha1-file-name-fix:
  remove over-eager caching in sha1_file_name
2010-06-13 11:22:00 -07:00
Jeff King 560fb6a183 remove over-eager caching in sha1_file_name
This function takes a sha1 and produces a loose object
filename. It caches the location of the object directory so
that it can fill the sha1 information directly without
allocating a new buffer (and in its original incarnation,
without calling getenv(), though these days we cache that
with the code in environment.c).

This cached base directory can become stale, however, if in
a single process git changes the location of the object
directory (e.g., by running setup_work_tree, which will
chdir to the new worktree).

In most cases this isn't a problem, because we tend to set
up the git repository location and do any chdir()s before
actually looking up any objects, so the first lookup will
cache the correct location. In the case of reset --hard,
however, we do something like:

  1. look up the commit object

  2. notice we are doing --hard, run setup_work_tree

  3. look up the tree object to reset

Step (3) fails because our cache object directory value is
bogus.

This patch simply removes the caching. We use a static
buffer instead of allocating one each time (the original
version treated the malloc'd buffer as a static, so there is
no change in calling semantics).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-25 09:21:28 -07:00
Junio C Hamano 035bf8d7c4 Merge branch 'sp/maint-dumb-http-pack-reidx'
* sp/maint-dumb-http-pack-reidx:
  http.c::new_http_pack_request: do away with the temp variable filename
  http-fetch: Use temporary files for pack-*.idx until verified
  http-fetch: Use index-pack rather than verify-pack to check packs
  Allow parse_pack_index on temporary files
  Extract verify_pack_index for reuse from verify_pack
  Introduce close_pack_index to permit replacement
  http.c: Remove unnecessary strdup of sha1_to_hex result
  http.c: Don't store destination name in request structures
  http.c: Drop useless != NULL test in finish_http_pack_request
  http.c: Tiny refactoring of finish_http_pack_request
  t5550-http-fetch: Use subshell for repository operations
  http.c: Remove bad free of static block
2010-05-21 04:02:19 -07:00
Junio C Hamano 636e87d705 Merge branch 'maint'
* maint:
  Documentation/gitdiffcore: fix order in pickaxe description
  Documentation: fix minor inconsistency
  Documentation: rebase -i ignores options passed to "git am"
  hash_object: correction for zero length file
2010-05-18 22:39:56 -07:00
Dmitry Potapov 08bda2085c hash_object: correction for zero length file
The check whether size is zero was done after if size <= SMALL_FILE_SIZE,
as result, zero size case was never triggered. Instead zero length file
was treated as any other small file. This did not caused any problem, but
if we have a special case for size equal to zero, it is better to make it
work and avoid redundant malloc().

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-18 21:46:36 -07:00
Shawn O. Pearce 7b64469a36 Allow parse_pack_index on temporary files
The easiest way to verify a pack index is to open it through the
standard parse_pack_index function, permitting the header check
to happen when the file is mapped.  However, the dumb HTTP client
needs to verify a pack index before its moved into its proper file
name within the objects/pack directory, to prevent a corrupt index
from being made available.  So permit the caller to specify the
exact path of the index file.

For now we're still using the final destination name within the
sole call site in http.c, but eventually we will start to parse
the temporary path instead.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-19 17:56:17 -07:00
Shawn O. Pearce fa5fc15d6e Introduce close_pack_index to permit replacement
By closing the pack index, a caller can later overwrite the index
with an updated index file, possibly after converting from v1 to
the v2 format.  Because p->index_data is NULL after close, on the
next access the index will be opened again and the other members
will be updated with new data.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-19 17:56:08 -07:00
Jeff King 40d52ff77b make commit_tree a library function
Until now, this has been part of the commit-tree builtin.
However, it is already used by other builtins (like commit,
merge, and notes), and it would be useful to access it from
library code.

The check_valid helper has to come along, too, but is given
a more library-ish name of "assert_sha1_type".

Otherwise, the code is unchanged. There are still a few
rough edges for a library function, like printing the utf8
warning to stderr, but we can address those if and when they
come up as inappropriate.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-01 23:53:54 -07:00
Jeff King c00e657df2 fix const-correctness of write_sha1_file
These should take const buffers as input data, but zlib's
next_in pointer is not const-correct. Let's fix it at the
zlib level, though, so the cast happens in one obvious
place. This should be safe, as a similar cast is used in
zlib's example code for a const array.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-01 23:49:03 -07:00
Junio C Hamano 493e433277 Merge branch 'mm/mkstemps-mode-for-packfiles' into maint
* mm/mkstemps-mode-for-packfiles:
  Use git_mkstemp_mode instead of plain mkstemp to create object files
  git_mkstemps_mode: don't set errno to EINVAL on exit.
  Use git_mkstemp_mode and xmkstemp_mode in odb_mkstemp, not chmod later.
  git_mkstemp_mode, xmkstemp_mode: variants of gitmkstemps with mode argument.
  Move gitmkstemps to path.c
  Add a testcase for ACL with restrictive umask.
2010-03-08 00:36:00 -08:00
Junio C Hamano c2b456b895 Merge branch 'nd/root-git'
* nd/root-git:
  Add test for using Git at root of file system
  Support working directory located at root
  Move offset_1st_component() to path.c
  init-db, rev-parse --git-dir: do not append redundant slash
  make_absolute_path(): Do not append redundant slash

Conflicts:
	setup.c
	sha1_file.c
2010-03-07 12:47:15 -08:00
Junio C Hamano 87912fd617 Merge branch 'mm/mkstemps-mode-for-packfiles'
* mm/mkstemps-mode-for-packfiles:
  Use git_mkstemp_mode instead of plain mkstemp to create object files
  git_mkstemps_mode: don't set errno to EINVAL on exit.
  Use git_mkstemp_mode and xmkstemp_mode in odb_mkstemp, not chmod later.
  git_mkstemp_mode, xmkstemp_mode: variants of gitmkstemps with mode argument.
  Move gitmkstemps to path.c
  Add a testcase for ACL with restrictive umask.
2010-03-07 12:47:14 -08:00
Junio C Hamano 780fc9a0a6 Merge branch 'dp/read-not-mmap-small-loose-object' into maint
* dp/read-not-mmap-small-loose-object:
  hash-object: don't use mmap() for small files
2010-03-04 22:26:17 -08:00
Junio C Hamano 34c014d13e Merge branch 'np/compress-loose-object-memsave'
* np/compress-loose-object-memsave:
  sha1_file: be paranoid when creating loose objects
  sha1_file: don't malloc the whole compressed result when writing out objects
2010-03-02 12:44:09 -08:00
Matthieu Moy 5256b00631 Use git_mkstemp_mode instead of plain mkstemp to create object files
We used to unnecessarily give the read permission to group and others,
regardless of the umask, which isn't serious because the objects are
still protected by their containing directory, but isn't necessary
either.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-22 15:24:46 -08:00
Nicolas Pitre 748af44c63 sha1_file: be paranoid when creating loose objects
We don't want the data being deflated and stored into loose objects
to be different from what we expect.  While the deflated data is
protected by a CRC which is good enough for safe data retrieval
operations, we still want to be doubly sure that the source data used
at object creation time is still what we expected once that data has
been deflated and its CRC32 computed.

The most plausible data corruption may occur if the source file is
modified while Git is deflating and writing it out in a loose object.
Or Git itself could have a bug causing memory corruption.  Or even bad
RAM could cause trouble.  So it is best to make sure everything is
coherent and checksum protected from beginning to end.

To do so we compute the SHA1 of the data being deflated _after_ the
deflate operation has consumed that data, and make sure it matches
with the expected SHA1.  This way we can rely on the CRC32 checked by
the inflate operation to provide a good indication that the data is still
coherent with its SHA1 hash.  One pathological case we ignore is when
the data is modified before (or during) deflate call, but changed back
before it is hashed.

There is some overhead of course. Using 'git add' on a set of large files:

Before:

	real    0m25.210s
	user    0m23.783s
	sys     0m1.408s

After:

	real    0m26.537s
	user    0m25.175s
	sys     0m1.358s

The overhead is around 5% for full data coherency guarantee.

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-21 22:33:25 -08:00
Dmitry Potapov ea68b0ce9f hash-object: don't use mmap() for small files
Using read() instead of mmap() can be 39% speed up for 1Kb files and is
1% speed up 1Mb files. For larger files, it is better to use mmap(),
because the difference between is not significant, and when there is not
enough memory, mmap() performs much better, because it avoids swapping.

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-21 11:39:10 -08:00
Nicolas Pitre 9892bebafe sha1_file: don't malloc the whole compressed result when writing out objects
There is no real advantage to malloc the whole output buffer and
deflate the data in a single pass when writing loose objects. That is
like only 1% faster while using more memory, especially with large
files where memory usage is far more. It is best to deflate and write
the data out in small chunks reusing the same memory instead.

For example, using 'git add' on a few large files averaging 40 MB ...

Before:
21.45user 1.10system 0:22.57elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+828040outputs (0major+142640minor)pagefaults 0swaps

After:
21.50user 1.25system 0:22.76elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+828040outputs (0major+104408minor)pagefaults 0swaps

While the runtime stayed relatively the same, the number of minor page
faults went down significantly.

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-21 11:36:23 -08:00
Nguyễn Thái Ngọc Duy 4bb43de259 Move offset_1st_component() to path.c
The implementation is also lightly modified to use is_dir_sep()
instead of hardcoding '/'.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-16 08:54:34 -08:00
Junio C Hamano a0075d9e6a Merge branch 'il/maint-xmallocz'
* il/maint-xmallocz:
  Fix integer overflow in unpack_compressed_entry()
  Fix integer overflow in unpack_sha1_rest()
  Fix integer overflow in patch_delta()
  Add xmallocz()
2010-01-27 14:56:38 -08:00
Ilari Liusvaara 4ab07e4d10 Fix integer overflow in unpack_compressed_entry()
Signed-off-by: Ilari Liusvaara <ilari.liusvaara@elisanet.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-26 13:00:16 -08:00
Ilari Liusvaara 3aee68aa68 Fix integer overflow in unpack_sha1_rest()
[jc: later NUL termination by the caller becomes unnecessary]

Signed-off-by: Ilari Liusvaara <ilari.liusvaara@elisanet.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-26 13:00:10 -08:00
Linus Torvalds a5031214c4 slim down "git show-index"
As the documentation says, this is primarily for debugging, and
in the longer term we should rename it to test-show-index or something.

In the meantime, just avoid xmalloc (which slurps in the rest of git), and
separating out the trivial hex functions into "hex.o".

This results in

  [torvalds@nehalem git]$ size git-show-index
       text    data     bss     dec     hex filename
     222818    2276  112688  337782   52776 git-show-index (before)
       5696     624    1264    7584    1da0 git-show-index (after)

which is a whole lot better.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-21 20:03:45 -08:00
Junio C Hamano 356521ab22 sha1_file.c: remove unused function
has_pack_file() is not used anywhere.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-12 01:06:09 -08:00
Junio C Hamano 39eea7bdd9 Fix incorrect error check while reading deflated pack data
The loop in get_size_from_delta() feeds a deflated delta data from the
pack stream _until_ we get inflated result of 20 bytes[*] or we reach the
end of stream.

    Side note. This magic number 20 does not have anything to do with the
    size of the hash we use, but comes from 1a3b55c (reduce delta head
    inflated size, 2006-10-18).

The loop reads like this:

    do {
        in = use_pack();
        stream.next_in = in;
        st = git_inflate(&stream, Z_FINISH);
        curpos += stream.next_in - in;
    } while ((st == Z_OK || st == Z_BUF_ERROR) &&
             stream.total_out < sizeof(delta_head));

This git_inflate() can return:

 - Z_STREAM_END, if use_pack() fed it enough input and the delta itself
   was smaller than 20 bytes;

 - Z_OK, when some progress has been made;

 - Z_BUF_ERROR, if no progress is possible, because we either ran out of
   input (due to corrupt pack), or we ran out of output before we saw the
   end of the stream.

The fix b3118bd (sha1_file: Fix infinite loop when pack is corrupted,
2009-10-14) attempted was against a corruption that appears to be a valid
stream that produces a result larger than the output buffer, but we are
not even trying to read the stream to the end in this loop.  If avail_out
becomes zero, total_out will be the same as sizeof(delta_head) so the loop
will terminate without the "fix".  There is no fix from b3118bd needed for
this loop, in other words.

The loop in unpack_compressed_entry() is quite a different story.  It
feeds a deflated stream (either delta or base) and allows the stream to
produce output up to what we expect but no more.

    do {
        in = use_pack();
        stream.next_in = in;
        st = git_inflate(&stream, Z_FINISH);
        curpos += stream.next_in - in;
    } while (st == Z_OK || st == Z_BUF_ERROR)

This _does_ risk falling into an endless interation, as we can exhaust
avail_out if the length we expect is smaller than what the stream wants to
produce (due to pack corruption).  In such a case, avail_out will become
zero and inflate() will return Z_BUF_ERROR, while avail_in may (or may
not) be zero.

But this is not a right fix:

    do {
        in = use_pack();
        stream.next_in = in;
        st = git_inflate(&stream, Z_FINISH);
+       if (st == Z_BUF_ERROR && (stream.avail_in || !stream.avail_out)
+               break; /* wants more input??? */
        curpos += stream.next_in - in;
    } while (st == Z_OK || st == Z_BUF_ERROR)

as Z_BUF_ERROR from inflate() may be telling us that avail_in has also run
out before reading the end of stream marker.  In such a case, both avail_in
and avail_out would be zero, and the loop should iterate to allow the end
of stream marker to be seen by inflate from the input stream.

The right fix for this loop is likely to be to increment the initial
avail_out by one (we allocate one extra byte to terminate it with NUL
anyway, so there is no risk to overrun the buffer), and break out if we
see that avail_out has become zero, in order to detect that the stream
wants to produce more than what we expect.  After the loop, we have a
check that exactly tests this condition:

    if ((st != Z_STREAM_END) || stream.total_out != size) {
        free(buffer);
        return NULL;
    }

So here is a patch (without my previous botched attempts) to fix this
issue.  The first hunk reverts the corresponding hunk from b3118bd, and
the second hunk is the same fix proposed earlier.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-21 23:19:47 -07:00
Shawn O. Pearce b3118bdc91 sha1_file: Fix infinite loop when pack is corrupted
Some types of corruption to a pack may confuse the deflate stream
which stores an object.  In Andy's reported case a 36 byte region
of the pack was overwritten, leading to what appeared to be a valid
deflate stream that was trying to produce a result larger than our
allocated output buffer could accept.

Z_BUF_ERROR is returned from inflate() if either the input buffer
needs more input bytes, or the output buffer has run out of space.
Previously we only considered the former case, as it meant we needed
to move the stream's input buffer to the next window in the pack.

We now abort the loop if inflate() returns Z_BUF_ERROR without
consuming the entire input buffer it was given, or has filled
the entire output buffer but has not yet returned Z_STREAM_END.
Either state is a clear indicator that this loop is not working
as expected, and should not continue.

This problem cannot occur with loose objects as we open the entire
loose object as a single buffer and treat Z_BUF_ERROR as an error.

Reported-by: Andy Isaacson <adi@hexapodia.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-14 13:39:37 -07:00
Junio C Hamano f00ecbe42b Merge branch 'cc/replace'
* cc/replace:
  t6050: check pushing something based on a replaced commit
  Documentation: add documentation for "git replace"
  Add git-replace to .gitignore
  builtin-replace: use "usage_msg_opt" to give better error messages
  parse-options: add new function "usage_msg_opt"
  builtin-replace: teach "git replace" to actually replace
  Add new "git replace" command
  environment: add global variable to disable replacement
  mktag: call "check_sha1_signature" with the replacement sha1
  replace_object: add a test case
  object: call "check_sha1_signature" with the replacement sha1
  sha1_file: add a "read_sha1_file_repl" function
  replace_object: add mechanism to replace objects found in "refs/replace/"
  refs: add a "for_each_replace_ref" function
2009-08-21 18:47:53 -07:00
Pierre Habouzit f630cfda88 refactor: use bitsizeof() instead of 8 * sizeof()
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-22 21:57:41 -07:00
Junio C Hamano dd787c19c4 Merge branch 'tr/die_errno'
* tr/die_errno:
  Use die_errno() instead of die() when checking syscalls
  Convert existing die(..., strerror(errno)) to die_errno()
  die_errno(): double % in strerror() output just in case
  Introduce die_errno() that appends strerror(errno) to die()
2009-07-06 09:39:46 -07:00
Thomas Rast d824cbba02 Convert existing die(..., strerror(errno)) to die_errno()
Change calls to die(..., strerror(errno)) to use the new die_errno().

In the process, also make slight style adjustments: at least state
_something_ about the function that failed (instead of just printing
the pathname), and put paths in single quotes.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-27 11:14:53 -07:00
Linus Torvalds 48fb7deb5b Fix big left-shifts of unsigned char
Shifting 'unsigned char' or 'unsigned short' left can result in sign
extension errors, since the C integer promotion rules means that the
unsigned char/short will get implicitly promoted to a signed 'int' due to
the shift (or due to other operations).

This normally doesn't matter, but if you shift things up sufficiently, it
will now set the sign bit in 'int', and a subsequent cast to a bigger type
(eg 'long' or 'unsigned long') will now sign-extend the value despite the
original expression being unsigned.

One example of this would be something like

	unsigned long size;
	unsigned char c;

	size += c << 24;

where despite all the variables being unsigned, 'c << 24' ends up being a
signed entity, and will get sign-extended when then doing the addition in
an 'unsigned long' type.

Since git uses 'unsigned char' pointers extensively, we actually have this
bug in a couple of places.

I may have missed some, but this is the result of looking at

	git grep '[^0-9 	][ 	]*<<[ 	][a-z]' -- '*.c' '*.h'
	git grep '<<[   ]*24'

which catches at least the common byte cases (shifting variables by a
variable amount, and shifting by 24 bits).

I also grepped for just 'unsigned char' variables in general, and
converted the ones that most obviously ended up getting implicitly cast
immediately anyway (eg hash_name(), encode_85()).

In addition to just avoiding 'unsigned char', this patch also tries to use
a common idiom for the delta header size thing. We had three different
variations on it: "& 0x7fUL" in one place (getting the sign extension
right), and "& ~0x80" and "& 0x7f" in two other places (not getting it
right). Apart from making them all just avoid using "unsigned char" at
all, I also unified them to then use a simple "& 0x7f".

I considered making a sparse extension which warns about doing implicit
casts from unsigned types to signed types, but it gets rather complex very
quickly, so this is just a hack.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-18 09:22:46 -07:00
Christian Couder f5552aee39 sha1_file: add a "read_sha1_file_repl" function
This new function will replace "read_sha1_file". This latter function
becoming just a stub to call the former will a NULL "replacement"
argument.

This new function is needed because sometimes we need to use the
replacement sha1.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-31 17:02:59 -07:00
Christian Couder 6809557029 replace_object: add mechanism to replace objects found in "refs/replace/"
The code implementing this mechanism has been copied more-or-less
from the commit graft code.

This mechanism is used in "read_sha1_file". sha1 passed to this
function that match a ref name in "refs/replace/" are replaced by
the sha1 that has been read in the ref.

We "die" if the replacement recursion depth is too high or if we
can't read the replacement object.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-31 17:02:59 -07:00
Junio C Hamano 2c5942dbae Merge branch 'ar/unlink-err' into maint
* ar/unlink-err:
  print unlink(2) errno in copy_or_link_directory
  replace direct calls to unlink(2) with unlink_or_warn
  Introduce an unlink(2) wrapper which gives warning if unlink failed
2009-05-25 19:01:50 -07:00
Junio C Hamano 065b0702f7 Merge branch 'maint'
* maint:
  grep: fix word-regexp colouring
  completion: use git rev-parse to detect bare repos
  Cope better with a _lot_ of packs
  for-each-ref: fix segfault in copy_email
2009-05-20 18:59:09 -07:00
Johannes Schindelin fd73ccf279 Cope better with a _lot_ of packs
You might end up with a situation where you have tons of pack files, e.g.
when using hg2git.  In this situation, all kinds of operations may
end up with a "too many files open" error.  Let's recover gracefully from
that.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Looks-right-to-me-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-20 18:23:06 -07:00
Junio C Hamano 36587681b4 Merge branch 'ar/unlink-err'
* ar/unlink-err:
  print unlink(2) errno in copy_or_link_directory
  replace direct calls to unlink(2) with unlink_or_warn
  Introduce an unlink(2) wrapper which gives warning if unlink failed
2009-05-18 09:01:06 -07:00
Felipe Contreras 4b25d091ba Fix a bunch of pointer declarations (codestyle)
Essentially; s/type* /type */ as per the coding guidelines.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-01 15:17:31 -07:00
Alex Riesen 691f1a28bf replace direct calls to unlink(2) with unlink_or_warn
This helps to notice when something's going wrong, especially on
systems which lock open files.

I used the following criteria when selecting the code for replacement:
- it was already printing a warning for the unlink failures
- it is in a function which already printing something or is
  called from such a function
- it is in a static function, returning void and the function is only
  called from a builtin main function (cmd_)
- it is in a function which handles emergency exit (signal handlers)
- it is in a function which is obvously cleaning up the lockfiles

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-29 18:37:41 -07:00
Johannes Schindelin 348df16679 Rename core.unreliableHardlinks to core.createObject
"Unreliable hardlinks" is a misleading description for what is happening.
So rename it to something less misleading.

Suggested by Linus Torvalds.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-29 16:50:07 -07:00
Johannes Schindelin be66a6c43d Add an option not to use link(src, dest) && unlink(src) when that is unreliable
It seems that accessing NTFS partitions with ufsd (at least on my EeePC)
has an unnerving bug: if you link() a file and unlink() it right away,
the target of the link() will have the correct size, but consist of NULs.

It seems as if the calls are simply not serialized correctly, as single-stepping
through the function move_temp_to_file() works flawlessly.

As ufsd is "Commertial software" (sic!), I cannot fix it, and have to work
around it in Git.

At the same time, it seems that this fixes msysGit issues 222 and 229 to
assume that Windows cannot handle link() && unlink().

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-25 09:49:21 -07:00
Junio C Hamano 03a39a9184 Merge branch 'jc/shared-literally'
* jc/shared-literally:
  t1301: loosen test for forced modes
  set_shared_perm(): sometimes we know what the final mode bits should look like
  move_temp_to_file(): do not forget to chmod() in "Coda hack" codepath
  Move chmod(foo, 0444) into move_temp_to_file()
  "core.sharedrepository = 0mode" should set, not loosen
2009-04-06 00:42:52 -07:00
Junio C Hamano 3c91bf6805 Merge branch 'jc/maint-1.6.0-keep-pack'
* jc/maint-1.6.0-keep-pack:
  pack-objects: don't loosen objects available in alternate or kept packs
  t7700: demonstrate repack flaw which may loosen objects unnecessarily
  Remove --kept-pack-only option and associated infrastructure
  pack-objects: only repack or loosen objects residing in "local" packs
  git-repack.sh: don't use --kept-pack-only option to pack-objects
  t7700-repack: add two new tests demonstrating repacking flaws

Conflicts:
	t/t7700-repack.sh
2009-04-01 22:34:19 -07:00
Junio C Hamano 17e61b8288 set_shared_perm(): sometimes we know what the final mode bits should look like
adjust_shared_perm() first obtains the mode bits from lstat(2), expecting
to find what the result of applying user's umask is, and then tweaks it
as necessary.  When the file to be adjusted is created with mkstemp(3),
however, the mode thusly obtained does not have anything to do with user's
umask, and we would need to start from 0444 in such a case and there is no
point running lstat(2) for such a path.

This introduces a new API set_shared_perm() to bypass the lstat(2) and
instead force setting the mode bits to the desired value directly.
adjust_shared_perm() becomes a thin wrapper to the function.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-28 08:02:15 -07:00
Junio C Hamano 3be1f18e1b move_temp_to_file(): do not forget to chmod() in "Coda hack" codepath
Now move_temp_to_file() is responsible for doing everything that is
necessary to turn a tempfile in $GIT_DIR into its final form, it must make
sure "Coda hack" codepath correctly makes the file read-only.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-28 08:01:21 -07:00
Johan Herland fb8b193670 Move chmod(foo, 0444) into move_temp_to_file()
When writing out a loose object or a pack (index), move_temp_to_file() is
called to finalize the resulting file. These files (loose files and packs)
should all have permission mode 0444 (modulo adjust_shared_perm()).
Therefore, instead of doing chmod(foo, 0444) explicitly from each callsite
(or even forgetting to chmod() at all), do the chmod() call from within
move_temp_to_file().

Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-27 22:10:58 -07:00
Junio C Hamano 5a688fe470 "core.sharedrepository = 0mode" should set, not loosen
This fixes the behaviour of octal notation to how it is defined in the
documentation, while keeping the traditional "loosen only" semantics
intact for "group" and "everybody".

Three main points of this patch are:

 - For an explicit octal notation, the internal shared_repository variable
   is set to a negative value, so that we can tell "group" (which is to
   "OR" in 0660) and 0660 (which is to "SET" to 0660);

 - git-init did not set shared_repository variable early enough to affect
   the initial creation of many files, notably copied templates and the
   configuration.  We set it very early when a command-line option
   specifies a custom value.

 - Many codepaths create files inside $GIT_DIR by various ways that all
   involve mkstemp(), and then call move_temp_to_file() to rename it to
   its final destination.  We can add adjust_shared_perm() call here; for
   the traditional "loosen-only", this would be a no-op for many codepaths
   because the mode is already loose enough, but with the new behaviour it
   makes a difference.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-27 21:51:04 -07:00
Junio C Hamano 89fbda2425 Merge branch 'maint'
* maint:
  Increase the size of the die/warning buffer to avoid truncation
  close_sha1_file(): make it easier to diagnose errors
  avoid possible overflow in delta size filtering computation
2009-03-24 19:45:57 -07:00
Junio C Hamano b0de555410 Merge branch 'maint-1.6.1' into maint
* maint-1.6.1:
  close_sha1_file(): make it easier to diagnose errors
  avoid possible overflow in delta size filtering computation
2009-03-24 15:31:21 -07:00
Junio C Hamano 2a5643da73 Merge branch 'maint-1.6.0' into maint-1.6.1
* maint-1.6.0:
  close_sha1_file(): make it easier to diagnose errors
  avoid possible overflow in delta size filtering computation
2009-03-24 15:31:15 -07:00
Linus Torvalds e8bd78c3fc close_sha1_file(): make it easier to diagnose errors
A bug report with "unable to write sha1 file" made us realize that we do
not have enough information to guess why close() is failing.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-24 14:39:20 -07:00
Brandon Casey 4d6acb7041 Remove --kept-pack-only option and associated infrastructure
This option to pack-objects/rev-list was created to improve the -A and -a
options of repack.  It was found to be lacking in that it did not provide
the ability to differentiate between local and non-local kept packs, and
found to be unnecessary since objects residing in local kept packs can be
filtered out by the --honor-pack-keep option.

Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-20 13:32:33 -07:00
Junio C Hamano aec813062b Merge branch 'jc/maint-1.6.0-keep-pack'
* jc/maint-1.6.0-keep-pack:
  is_kept_pack(): final clean-up
  Simplify is_kept_pack()
  Consolidate ignore_packed logic more
  has_sha1_kept_pack(): take "struct rev_info"
  has_sha1_pack(): refactor "pretend these packs do not exist" interface
  git-repack: resist stray environment variable
2009-03-11 13:49:56 -07:00
Junio C Hamano 69e020ae00 is_kept_pack(): final clean-up
Now is_kept_pack() is just a member lookup into a structure, we can write
it as such.

Also rewrite the sole caller of has_sha1_kept_pack() to switch on the
criteria the callee uses (namely, revs->kept_pack_only) between calling
has_sha1_kept_pack() and has_sha1_pack(), so that these two callees do not
have to take a pointer to struct rev_info as an argument.

This removes the header file dependency issue temporarily introduced by
the earlier commit, so we revert changes associated to that as well.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-28 01:06:06 -08:00