Commit graph

85 commits

Author SHA1 Message Date
Junio C Hamano
1cb4b3d380 Merge branch 'js/fsck-tag-validation'
New tag object format validation added in 2.2 showed garbage
after a tagname it reported in its error message.

* js/fsck-tag-validation:
  index-pack: terminate object buffers with NUL
  fsck: properly bound "invalid tag name" error message
2014-12-22 12:27:41 -08:00
Duy Nguyen
a1e920a0a7 index-pack: terminate object buffers with NUL
We have some tricky checks in fsck that rely on a side effect of
require_end_of_header(), and would otherwise easily run outside
non-NUL-terminated buffers. This is a bit brittle, so let's make sure
that only NUL-terminated buffers are passed around to begin with.

Jeff "Peff" King contributed the detailed analysis which call paths are
involved and pointed out that we also have to patch the get_data()
function in unpack-objects.c, which is what Johannes "Dscho" Schindelin
implemented.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Analyzed-by: Jeff King <peff@peff.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-09 11:56:37 -08:00
Etienne Buira
e0e21283b6 index-pack: fix compilation with NO_PTHREADS
type_cas_lock/unlock() should be defined as no-op for NO_PTHREADS
build, just like all the other locking primitives.

Signed-off-by: Etienne Buira <etienne.buira@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-13 12:33:30 -07:00
Junio C Hamano
1c2ea2cdc0 Merge branch 'rs/realloc-array'
Code cleanup.

* rs/realloc-array:
  use REALLOC_ARRAY for changing the allocation size of arrays
  add macro REALLOC_ARRAY
2014-09-26 14:39:45 -07:00
Junio C Hamano
13f4f04692 Merge branch 'js/fsck-tag-validation'
Teach "git fsck" to inspect the contents of annotated tag objects.

* js/fsck-tag-validation:
  Make sure that index-pack --strict checks tag objects
  Add regression tests for stricter tag fsck'ing
  fsck: check tag objects' headers
  Make sure fsck_commit_buffer() does not run out of the buffer
  fsck_object(): allow passing object data separately from the object itself
  Refactor type_from_string() to allow continuing after detecting an error
2014-09-26 14:39:43 -07:00
Junio C Hamano
bd656f6e7b Merge branch 'jk/index-pack-threading-races'
When receiving an invalid pack stream that records the same object
twice, multiple threads got confused due to a race.  We should
reject or correct such a stream upon receiving, but that will be a
larger change.

* jk/index-pack-threading-races:
  index-pack: fix race condition with duplicate bases
2014-09-19 11:38:34 -07:00
René Scharfe
2756ca4347 use REALLOC_ARRAY for changing the allocation size of arrays
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-18 09:13:42 -07:00
Johannes Schindelin
90a398bbd7 fsck_object(): allow passing object data separately from the object itself
When fsck'ing an incoming pack, we need to fsck objects that cannot be
read via read_sha1_file() because they are not local yet (and might even
be rejected if transfer.fsckobjects is set to 'true').

For commits, there is a hack in place: we basically cache commit
objects' buffers anyway, but the same is not true, say, for tag objects.

By refactoring fsck_object() to take the object buffer and size as
optional arguments -- optional, because we still fall back to the
previous method to look at the cached commit objects if the caller
passes NULL -- we prepare the machinery for the upcoming handling of tag
objects.

The assumption that such buffers are inherently NUL terminated is now
wrong, of course, hence we pass the size of the buffer so that we can
add a sanity check later, to prevent running past the end of the buffer.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-10 13:54:21 -07:00
Jeff King
ab791dd138 index-pack: fix race condition with duplicate bases
When we are resolving deltas in an indexed pack, we do it by
first selecting a potential base (either one stored in full
in the pack, or one created by resolving another delta), and
then resolving any deltas that use that base.  When we
resolve a particular delta, we flip its "real_type" field
from OBJ_{REF,OFS}_DELTA to whatever the real type is.

We assume that traversing the objects this way will visit
each delta only once. This is correct for most packs; we
visit the delta only when we process its base, and each
object (and thus each base) appears only once. However, if a
base object appears multiple times in the pack, we will try
to resolve any deltas based on it once for each instance.

We can detect this case by noting that a delta we are about
to resolve has already had its real_type field flipped, and
we already do so with an assert().  However, if multiple
threads are in use, we may race with another thread on
comparing and flipping the field. We need to synchronize the
access.

The right mechanism for doing this is a compare-and-swap (we
atomically "claim" the delta for our own and find out
whether our claim was successful). We can implement this
in C by using a pthread mutex to protect the operation. This
is not the fastest way of doing a compare-and-swap; many
processors provide instructions for this, and gcc and other
compilers provide builtins to access them. However, some
experiments showed that lock contention does not cause a
significant slowdown here. Adding c-a-s support for many
compilers would increase the maintenance burden (and we
would still end up including the pthread version as a
fallback).

Note that we only need to touch the OBJ_REF_DELTA codepath
here. An OBJ_OFS_DELTA object points to its base using an
offset, and therefore has only one base, even if another
copy of that base object appears in the pack (we do still
touch it briefly because the setting of real_type is
factored out of resolve_data).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-29 14:50:43 -07:00
Junio C Hamano
9ab0882255 Merge branch 'maint'
* maint:
  use xmemdupz() to allocate copies of strings given by start and length
  use xcalloc() to allocate zero-initialized memory
2014-07-21 12:35:39 -07:00
René Scharfe
51a60f5bfb use xcalloc() to allocate zero-initialized memory
Use xcalloc() instead of xmalloc() followed by memset() to allocate
and zero out memory because it's shorter and avoids duplicating the
function parameters.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-21 10:30:21 -07:00
Junio C Hamano
6e4094731a Merge branch 'jk/strip-suffix'
* jk/strip-suffix:
  prepare_packed_git_one: refactor duplicate-pack check
  verify-pack: use strbuf_strip_suffix
  strbuf: implement strbuf_strip_suffix
  index-pack: use strip_suffix to avoid magic numbers
  use strip_suffix instead of ends_with in simple cases
  replace has_extension with ends_with
  implement ends_with via strip_suffix
  add strip_suffix function
  sha1_file: replace PATH_MAX buffer with strbuf in prepare_packed_git_one()
2014-07-16 11:26:00 -07:00
Junio C Hamano
5c18fde0d9 Merge branch 'jk/commit-buffer-length' into maint
A handful of code paths had to read the commit object more than
once when showing header fields that are usually not parsed.  The
internal data structure to keep track of the contents of the commit
object has been updated to reduce the need for this double-reading,
and to allow the caller find the length of the object.

* jk/commit-buffer-length:
  reuse cached commit buffer when parsing signatures
  commit: record buffer length in cache
  commit: convert commit->buffer to a slab
  commit-slab: provide a static initializer
  use get_commit_buffer everywhere
  convert logmsg_reencode to get_commit_buffer
  use get_commit_buffer to avoid duplicate code
  use get_cached_commit_buffer where appropriate
  provide helpers to access the commit buffer
  provide a helper to set the commit buffer
  provide a helper to free commit buffer
  sequencer: use logmsg_reencode in get_message
  logmsg_reencode: return const buffer
  do not create "struct commit" with xcalloc
  commit: push commit_index update into alloc_commit_node
  alloc: include any-object allocations in alloc_report
  replace dangerous uses of strbuf_attach
  commit_tree: take a pointer/len pair rather than a const strbuf
2014-07-16 11:16:38 -07:00
Junio C Hamano
8061ae8b46 Merge branch 'jk/commit-buffer-length'
Move "commit->buffer" out of the in-core commit object and keep
track of their lengths.  Use this to optimize the code paths to
validate GPG signatures in commit objects.

* jk/commit-buffer-length:
  reuse cached commit buffer when parsing signatures
  commit: record buffer length in cache
  commit: convert commit->buffer to a slab
  commit-slab: provide a static initializer
  use get_commit_buffer everywhere
  convert logmsg_reencode to get_commit_buffer
  use get_commit_buffer to avoid duplicate code
  use get_cached_commit_buffer where appropriate
  provide helpers to access the commit buffer
  provide a helper to set the commit buffer
  provide a helper to free commit buffer
  sequencer: use logmsg_reencode in get_message
  logmsg_reencode: return const buffer
  do not create "struct commit" with xcalloc
  commit: push commit_index update into alloc_commit_node
  alloc: include any-object allocations in alloc_report
  replace dangerous uses of strbuf_attach
  commit_tree: take a pointer/len pair rather than a const strbuf
2014-07-02 12:53:02 -07:00
Jeff King
592ce20820 index-pack: use strip_suffix to avoid magic numbers
We also switch to using strbufs, which lets us avoid the
potentially dangerous combination of a manual malloc
followed by a strcpy.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-30 13:43:17 -07:00
Jeff King
2975c770ca replace has_extension with ends_with
These two are almost the same function, with the exception
that has_extension only matches if there is content before
the suffix. So ends_with(".exe", ".exe") is true, but
has_extension would not be.

This distinction does not matter to any of the callers,
though, and we can just replace uses of has_extension with
ends_with. We prefer the "ends_with" name because it is more
generic, and there is nothing about the function that
requires it to be used for file extensions.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-30 13:43:16 -07:00
Junio C Hamano
1881d2b88c Merge branch 'ym/fix-opportunistic-index-update-race' into maint
"git status", even though it is a read-only operation, tries to
update the index with refreshed lstat(2) info to optimize future
accesses to the working tree opportunistically, but this could
race with a "read-write" operation that modify the index while it
is running.  Detect such a race and avoid overwriting the index.

* ym/fix-opportunistic-index-update-race:
  read-cache.c: verify index file before we opportunistically update it
  wrapper.c: add xpread() similar to xread()
2014-06-25 11:49:48 -07:00
Junio C Hamano
182c3d69e4 Merge branch 'jk/index-pack-report-missing' into maint
The error reporting from "git index-pack" has been improved to
distinguish missing objects from type errors.

* jk/index-pack-report-missing:
  index-pack: distinguish missing objects from type errors
2014-06-25 11:48:14 -07:00
Junio C Hamano
a9041df7ab Merge branch 'nd/index-pack-one-fd-per-thread' into maint
We used to disable threaded "git index-pack" on platforms without
thread-safe pread(); use a different workaround for such
platforms to allow threaded "git index-pack".

* nd/index-pack-one-fd-per-thread:
  index-pack: work around thread-unsafe pread()
2014-06-25 11:47:58 -07:00
Jeff King
8597ea3afe commit: record buffer length in cache
Most callsites which use the commit buffer try to use the
cached version attached to the commit, rather than
re-reading from disk. Unfortunately, that interface provides
only a pointer to the NUL-terminated buffer, with no
indication of the original length.

For the most part, this doesn't matter. People do not put
NULs in their commit messages, and the log code is happy to
treat it all as a NUL-terminated string. However, some code
paths do care. For example, when checking signatures, we
want to be very careful that we verify all the bytes to
avoid malicious trickery.

This patch just adds an optional "size" out-pointer to
get_commit_buffer and friends. The existing callers all pass
NULL (there did not seem to be any obvious sites where we
could avoid an immediate strlen() call, though perhaps with
some further refactoring we could).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 12:09:38 -07:00
Jeff King
0fb370da9c provide a helper to free commit buffer
This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.

Note that we also need to provide a "detach" mechanism for a
tricky case in index-pack. We are passed a buffer for the
object generated by processing the incoming pack. If we are
not using --strict, we just calculate the sha1 on that
buffer and return, leaving the caller to free it.  But if we
are using --strict, we actually attach that buffer to an
object, pass the object to the fsck functions, and then
detach the buffer from the object again (so that the caller
can free it as usual).  In this case, we don't want to free
the buffer ourselves, but just make sure it is no longer
associated with the commit.

Note that we are making the assumption here that the
attach/detach process does not impact the buffer at all
(e.g., it is never reallocated or modified). That holds true
now, and we have no plans to change that. However, as we
abstract the commit_buffer code, this dependency becomes
less obvious. So when we detach, let's also make sure that
we get back the same buffer that we gave to the
commit_buffer code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 12:07:47 -07:00
Junio C Hamano
a0460132a7 Merge branch 'jk/index-pack-report-missing'
* jk/index-pack-report-missing:
  index-pack: distinguish missing objects from type errors
2014-06-06 11:28:13 -07:00
Junio C Hamano
53f52cd92a Merge branch 'nd/index-pack-one-fd-per-thread'
Enable threaded index-pack on platforms without thread-unsafe
pread() emulation.

* nd/index-pack-one-fd-per-thread:
  index-pack: work around thread-unsafe pread()
2014-06-03 12:06:42 -07:00
Junio C Hamano
9af098c29b Merge branch 'ym/fix-opportunistic-index-update-race'
Read-only operations such as "git status" that internally refreshes
the index write out the refreshed index to the disk to optimize
future accesses to the working tree, but this could race with a
"read-write" operation that modify the index while it is running.
Detect such a race and avoid overwriting the index.

Duy raised a good point that we may need to do the same for the
normal writeout codepath, not just the "opportunistic" update
codepath.  While that is true, nobody sane would be running two
simultaneous operations that are clearly write-oriented competing
with each other against the same index file.  So in that sense that
can be done as a less urgent follow-up for this topic.

* ym/fix-opportunistic-index-update-race:
  read-cache.c: verify index file before we opportunistically update it
  wrapper.c: add xpread() similar to xread()
2014-06-03 12:06:41 -07:00
Jeff King
77583e7739 index-pack: distinguish missing objects from type errors
When we fetch a pack that does not contain an object we
expected to receive, we get an error like:

  $ git init --bare tmp.git && cd tmp.git
  $ git fetch ../parent.git
  [...]
  error: Could not read 964953ec7bcc0245cb1d0db4095455edd21a2f2e
  fatal: Failed to traverse parents of commit b8247b40caf6704fe52736cdece6d6aae87471aa
  error: ../parent.git did not send all necessary objects

This comes from the check_everything_connected rev-list. If
we try cloning the same repo (rather than a fetch), we end
up using index-pack's --check-self-contained-and-connected
option instead, which produces output like:

  $ git clone --no-local --bare parent.git tmp.git
  [...]
  fatal: object of unexpected type
  fatal: index-pack failed

Not only is the sha1 missing, but it's a misleading message.
There's no type problem, but rather a missing object
problem; we don't notice the difference because we simply
compare OBJ_BAD != OBJ_BLOB.  Let's provide a different
message for this case:

  $ git clone --no-local --bare parent.git tmp.git
  fatal: did not receive expected object 6b00a8c61ed379d5f925a72c1987c9c52129d364
  fatal: index-pack failed

While we're at it, let's also improve a true type mismatch
error to look like

  fatal: object 6b00a8c61ed379d5f925a72c1987c9c52129d364: expected type blob, got tree

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-12 11:27:50 -07:00
Nguyễn Thái Ngọc Duy
39539495ac index-pack: work around thread-unsafe pread()
Multi-threaing of index-pack was disabled with c0f8654
(index-pack: Disable threading on cygwin - 2012-06-26), because
pread() implementations for Cygwin and MSYS were not thread
safe.  Recent Cygwin does offer usable pread() and we enabled
multi-threading with 103d530f (Cygwin 1.7 has thread-safe pread,
2013-07-19).

Work around this problem on platforms with a thread-unsafe
pread() emulation by opening one file handle per thread; it
would prevent parallel pread() on different file handles from
stepping on each other.

Also remove NO_THREAD_SAFE_PREAD that was introduced in c0f8654
because it's no longer used anywhere.

This workaround is unconditional, even for platforms with
thread-safe pread() because the overhead is small (a couple file
handles more) and not worth fragmenting the code.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Tested-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-16 09:29:41 -07:00
Yiannis Marangos
9aa91af036 wrapper.c: add xpread() similar to xread()
It is a common mistake to call read(2)/pread(2) and forget to
anticipate that they may return error with EAGAIN/EINTR when the
system call is interrupted.

We have xread() helper to relieve callers of read(2) from having to
worry about it; add xpread() helper to do the same for pread(2).

Update the caller in the builtin/index-pack.c and the mmap emulation
in compat/.

Signed-off-by: Yiannis Marangos <yiannis.marangos@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-10 12:18:55 -07:00
Junio C Hamano
3dd108348f Merge branch 'nd/index-pack-error-message' into maint
* nd/index-pack-error-message:
  index-pack: report error using the correct variable
2014-04-03 13:39:04 -07:00
Junio C Hamano
0e8c09263e Merge branch 'nd/index-pack-error-message'
* nd/index-pack-error-message:
  index-pack: report error using the correct variable
2014-03-25 11:08:19 -07:00
Junio C Hamano
de983a0a18 index-pack: report error using the correct variable
We feed a string pointer that is potentially NULL to die() when
showing the message.  Don't.

Noticed-by: Nguyễn Thái Ngọc Duy  <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-17 15:08:36 -07:00
Michael Haggerty
afc711b8e1 rename read_replace_refs to check_replace_refs
The semantics of this flag was changed in commit

    e1111cef23 inline lookup_replace_object() calls

but wasn't renamed at the time to minimize code churn.  Rename it now,
and add a comment explaining its use.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-20 14:16:55 -08:00
Christian Couder
5955654823 replace {pre,suf}fixcmp() with {starts,ends}_with()
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.

The change can be recreated by mechanically applying this:

    $ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
      grep -v strbuf\\.c |
      xargs perl -pi -e '
        s|!prefixcmp\(|starts_with\(|g;
        s|prefixcmp\(|!starts_with\(|g;
        s|!suffixcmp\(|ends_with\(|g;
        s|suffixcmp\(|!ends_with\(|g;
      '

on the result of preparatory changes in this series.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05 14:13:21 -08:00
Junio C Hamano
b8f23112f0 Merge branch 'jk/free-tree-buffer'
* jk/free-tree-buffer:
  clear parsed flag when we free tree buffers
2013-09-17 11:37:33 -07:00
Jeff King
6e454b9a31 clear parsed flag when we free tree buffers
Many code paths will free a tree object's buffer and set it
to NULL after finishing with it in order to keep memory
usage down during a traversal. However, out of 8 sites that
do this, only one actually unsets the "parsed" flag back.
Those sites that don't are setting a trap for later users of
the tree object; even after calling parse_tree, the buffer
will remain NULL, causing potential segfaults.

It is not known whether this is triggerable in the current
code. Most commands do not do an in-memory traversal
followed by actually using the objects again. However, it
does not hurt to be safe for future callers.

In most cases, we can abstract this out to a
"free_tree_buffer" helper. However, there are two
exceptions:

  1. The fsck code relies on the parsed flag to know that we
     were able to parse the object at one point. We can
     switch this to using a flag in the "flags" field.

  2. The index-pack code sets the buffer to NULL but does
     not free it (it is freed by a caller). We should still
     unset the parsed flag here, but we cannot use our
     helper, as we do not want to free the buffer.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-06 10:29:12 -07:00
Nguyễn Thái Ngọc Duy
c6807a40dc clone: open a shortcut for connectivity check
In order to make sure the cloned repository is good, we run "rev-list
--objects --not --all $new_refs" on the repository. This is expensive
on large repositories. This patch attempts to mitigate the impact in
this special case.

In the "good" clone case, we only have one pack. If all of the
following are met, we can be sure that all objects reachable from the
new refs exist, which is the intention of running "rev-list ...":

 - all refs point to an object in the pack
 - there are no dangling pointers in any object in the pack
 - no objects in the pack point to objects outside the pack

The second and third checks can be done with the help of index-pack as
a slight variation of --strict check (which introduces a new condition
for the shortcut: pack transfer must be used and the number of objects
large enough to call index-pack). The first is checked in
check_everything_connected after we get an "ok" from index-pack.

"index-pack + new checks" is still faster than the current "index-pack
+ rev-list", which is the whole point of this patch. If any of the
conditions fail, we fall back to the good old but expensive "rev-list
..". In that case it's even more expensive because we have to pay for
the new checks in index-pack. But that should only happen when the
other side is either buggy or malicious.

Cloning linux-2.6 over file://

        before         after
real    3m25.693s      2m53.050s
user    5m2.037s       4m42.396s
sys     0m13.750s      0m16.574s

A more realistic test with ssh:// over wireless

        before         after
real    11m26.629s     10m4.213s
user    5m43.196s      5m19.444s
sys     0m35.812s      0m37.630s

This shortcut is not applied to shallow clones, partly because shallow
clones should have no more objects than a usual fetch and the cost of
rev-list is acceptable, partly to avoid dealing with corner cases when
grafting is involved.

This shortcut does not apply to unpack-objects code path either
because the number of objects must be small in order to trigger that
code path.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-28 08:07:20 -07:00
Nguyễn Thái Ngọc Duy
920734b069 index-pack: remove dead code (it should never happen)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-28 08:07:03 -07:00
Junio C Hamano
1d066c58ee Merge branch 'nd/index-pack-threaded-fixes'
"index-pack --verify-stat" used a few counters outside protection
of mutex, possibly showing incorrect numbers.

* nd/index-pack-threaded-fixes:
  index-pack: guard nr_resolved_deltas reads by lock
  index-pack: protect deepest_delta in multithread code
2013-04-01 09:06:23 -07:00
Junio C Hamano
ed8852c286 Merge branch 'jk/index-pack-correct-depth-fix'
"index-pack --fix-thin" used uninitialize value to compute delta
depths of objects it appends to the resulting pack.

* jk/index-pack-correct-depth-fix:
  index-pack: always zero-initialize object_entry list
2013-04-01 09:06:19 -07:00
Jeff King
57165db003 index-pack: always zero-initialize object_entry list
Commit 38a4556 (index-pack: start learning to emulate
"verify-pack -v", 2011-06-03) added a "delta_depth" counter
to each "struct object_entry". Initially, all object entries
have their depth set to 0; in resolve_delta, we then set the
depth of each delta to "base + 1". Base entries never have
their depth touched, and remain at 0.

To ensure that all depths start at 0, that commit changed
calls to xmalloc the object_entry list into calls to
xcalloc.  However, it forgot that we grow the list with
xrealloc later. These extra entries are used when we add an
object from elsewhere to complete a thin pack. If we add a
non-delta object, its depth value will just be uninitialized
heap data.

This patch fixes it by zero-initializing entries we add to
the objects list via the xrealloc.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-20 12:53:26 -07:00
Thomas Rast
8f82aad4e7 index-pack: guard nr_resolved_deltas reads by lock
The threaded parts of index-pack increment the number of resolved
deltas in nr_resolved_deltas guarded by counter_mutex.  However, the
per-thread outer loop accessed nr_resolved_deltas without any locks.

This is not wrong as such, since it doesn't matter all that much
whether we get an outdated value.  However, unless someone proves that
this one lock makes all the performance difference, it would be much
cleaner to guard _all_ accesses to the variable with the lock.

The only such use is display_progress() in the threaded section (all
others are in the conclude_pack() callchain outside the threaded
part).  To make it obvious that it cannot deadlock, move it out of
work_mutex.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Reviewed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-19 08:40:47 -07:00
Nguyễn Thái Ngọc Duy
3aba2fddcb index-pack: protect deepest_delta in multithread code
deepest_delta is a global variable but is updated without protection
in resolve_delta(), a multithreaded function. Add a new mutex for it,
but only protect and update when it's actually used (i.e. show_stat is
non-zero).

Another variable that will not be updated is delta_depth in "struct
object_entry" as it's only useful when show_stat is 1. Putting it in
"if (show_stat)" makes it clearer.

The local variable "stat" is renamed to "show_stat" after moving to
global scope because the name "stat" conflicts with stat(2) syscall.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-19 08:40:19 -07:00
Nguyễn Thái Ngọc Duy
5c3459fc61 index-pack: fix buffer overflow caused by translations
The translation of "completed with %d local objects" is put in a
48-byte buffer, which may be enough for English but not true for any
translations. Convert it to use strbuf (i.e. no hard limit on
translation length).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-16 22:08:53 -07:00
Nguyễn Thái Ngọc Duy
f350df429f i18n: mark more index-pack strings for translation
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-31 13:05:05 -07:00
Junio C Hamano
37b92a9a2e Merge branch 'jk/index-pack-streaming-fix'
The streaming index-pack introduced in 1.7.11 had a data corruption
bug, and this should fix it.

* jk/index-pack-streaming-fix:
  index-pack: loop while inflating objects in unpack_data
2012-07-15 21:40:07 -07:00
Jeff King
f8b090386b index-pack: loop while inflating objects in unpack_data
When the unpack_data function is given a consume() callback,
it unpacks only 64K of the input at a time, feeding it to
git_inflate along with a 64K output buffer.  However,
because we are inflating, there is a good chance that the
output buffer will fill before consuming all of the input.
In this case, we need to loop on git_inflate until we have
fed the whole input buffer, feeding each chunk of output to
the consume buffer.

The current code does not do this, and as a result, will
fail the loop condition and trigger a fatal "serious inflate
inconsistency" error in this case.

While we're rearranging the loop, let's get rid of the
extra last_out pointer. It is meant to point to the
beginning of the buffer that we feed to git_inflate, but in
practice this is always the beginning of our same 64K
buffer, because:

  1. At the beginning of the loop, we are feeding the
     buffer.

  2. At the end of the loop, if we are using a consume()
     function, we reset git_inflate's pointer to the
     beginning of the buffer.  If we are not using a
     consume() function, then we do not care about the value
     of last_out at all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-10 16:45:00 -07:00
Junio C Hamano
c592023aed Merge branch 'rj/platform-pread-may-be-thread-unsafe'
On Cygwin, the platform pread(3) is not thread safe, just like our
own compat/ emulation, and cannot be used in the index-pack program.

* rj/platform-pread-may-be-thread-unsafe:
  index-pack: Disable threading on cygwin
2012-07-09 09:00:45 -07:00
Junio C Hamano
c0f86547c5 index-pack: Disable threading on cygwin
The Cygwin implementation of pread() is not thread-safe since, just
like the emulation provided by compat/pread.c, it uses a sequence of
seek-read-seek calls. In order to avoid failues due to thread-safety
issues, commit b038a61 disables threading when NO_PREAD is defined.
(ie when using the emulation code in compat/pread.c).

We introduce a new build variable, NO_THREAD_SAFE_PREAD, which allows
use to disable the threaded index-pack code on cygwin, in addition to
the above NO_PREAD case.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-06-26 14:23:03 -07:00
Nguyễn Thái Ngọc Duy
4614043c8f index-pack: use streaming interface for collision test on large blobs
When putting whole objects in core is unavoidable, try match object
type and size first before actually inflating.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-24 10:37:48 -07:00
Nguyễn Thái Ngọc Duy
8a2e163ccd index-pack: factor out unpack core from get_data_from_pack
This allows caller to consume large inflated object with a fixed
amount of memory.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-23 09:08:55 -07:00
Nguyễn Thái Ngọc Duy
9ec2dde9f3 index-pack: use streaming interface on large blobs (most of the time)
unpack_raw_entry() will not allocate and return decompressed blobs if
they are larger than core.bigFileThreshold. sha1_object() may not be
called on those objects because there's no actual content.

sha1_object() is called later on those objects, where we can safely
use get_data_from_pack() to retrieve blob content for checking.
However we always do that when we definitely need the blob
content. And we often don't.

There are two cases when we may need object content. The first case is
when we find an in-repo blob with the same SHA-1. We need to do
collision test, byte-on-byte. If this test is on, the blob must be
loaded on memory (i.e. no streaming). Normally (e.g. in
fetch/pull/clone) this does not happen because git avoid to send
objects that client already has.

The other case is when --strict is specified and the object in
question is not a blob, which can't happen in reality becase we deal
with large _blobs_ here.

Note: --verify (or git-verify-pack) a pack from current repository
will trigger collision test on every object in the pack, which
effectively disables this patch. This could be easily worked around by
setting GIT_DIR to an imaginary place with no packs.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-23 09:08:54 -07:00