Commit graph

8 commits

Author SHA1 Message Date
Jeff King 51054177b3 index-pack: detect local corruption in collision check
When we notice that we have a local copy of an incoming
object, we compare the two objects to make sure we haven't
found a collision. Before we get to the actual object
bytes, though, we compare the type and size from
sha1_object_info().

If our local object is corrupted, then the type will be
OBJ_BAD, which obviously will not match the incoming type,
and we'll report "SHA1 COLLISION FOUND" (with capital
letters and everything). This is confusing, as the problem
is not a collision but rather local corruption. We should
report that instead (just like we do if reading the rest of
the object content fails a few lines later).

Note that we _could_ just ignore the error and mark it as a
non-collision. That would let you "git fetch" to replace a
corrupted object. But it's not a very reliable method for
repairing a repository. The earlier want/have negotiation
tries to get the other side to omit objects we already have,
and it would not realize that we are "missing" this
corrupted object. So we're better off complaining loudly
when we see corruption, and letting the user take more
drastic measures to repair (like making a full clone
elsewhere and copying the pack into place).

Note that the test sets transfer.unpackLimit in the
receiving repository so that we use index-pack (which is
what does the collision check). Normally for such a small
push we'd use unpack-objects, which would simply try to
write the loose object, and discard the new one when we see
that there's already an old one.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-01 10:48:11 -07:00
Jeff King 93cff9a978 sha1_loose_object_info: return error for corrupted objects
When sha1_loose_object_info() finds that a loose object file
cannot be stat(2)ed or mmap(2)ed, it returns -1 to signal an
error to the caller.  However, if it found that the loose
object file is corrupt and the object data cannot be used
from it, it stuffs OBJ_BAD into "type" field of the
object_info, but returns zero (i.e., success), which can
confuse callers.

This is due to 052fe5eac (sha1_loose_object_info: make type
lookup optional, 2013-07-12), which switched the return to a
strict success/error, rather than returning the type (but
botched the return).

Callers of regular sha1_object_info() don't notice the
difference, as that function returns the type (which is
OBJ_BAD in this case). However, direct callers of
sha1_object_info_extended() see the function return success,
but without setting any meaningful values in the object_info
struct, leading them to access potentially uninitialized
memory.

The easiest way to see the bug is via "cat-file -s", which
will happily ignore the corruption and report whatever
value happened to be in the "size" variable.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-01 10:45:16 -07:00
Jeff King d3b34622f6 clone: leave repo in place after checkout errors
If we manage to clone a remote repository but run into an
error in the checkout, it is probably sane to leave the repo
directory in place. That lets the user examine the situation
without spending time to re-clone from the remote (which may
be a lengthy process).

Rather than try to convert each die() from the checkout code
path into an error(), we simply set a flag that tells the
"remove_junk" atexit function to print a helpful message and
leave the repo in place.

Note that the test added in this patch actually passes
without the code change. The reason is that the cleanup code
is buggy; we chdir into the working tree for the checkout,
but still may use relative paths to remove the directories
(which means if you cloned into "foo", we would accidentally
remove "foo" from the working tree!).  There's no point in
fixing it now, since this patch means we will never try to
remove anything after the chdir, anyway.

[jc: replaced the message with a more succinct version from
Jonathan]

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-29 15:20:55 -07:00
Jeff King 0433ad128c clone: run check_everything_connected
When we fetch from a remote, we do a revision walk to make
sure that what we received is connected to our existing
history. We do not do the same check for clone, which should
be able to check that we received an intact history graph.

The upside of this patch is that it will make clone more
resilient against propagating repository corruption. The
downside is that we will now traverse "rev-list --objects
--all" down to the roots, which may take some time (it is
especially noticeable for a "--local --bare" clone).

Note that we need to adjust t5710, which tries to make such
a bogus clone. Rather than checking after the fact that our
clone is bogus, we can simplify it to just make sure "git
clone" reports failure.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-27 13:47:18 -07:00
Jeff King 0aac7bb287 clone: die on errors from unpack_trees
When clone is populating the working tree, it ignores the
return status from unpack_trees; this means we may report a
successful clone, even when the checkout fails.

When checkout fails, we may want to leave the $GIT_DIR in
place, as it might be possible to recover the data through
further use of "git checkout" (e.g., if the checkout failed
due to a transient error, disk full, etc). However, we
already die on a number of other checkout-related errors, so
this patch follows that pattern.

In addition to marking a now-passing test, we need to adjust
t5710, which blindly assumed it could make bogus clones of
very deep alternates hierarchies. By using "--bare", we can
avoid it actually touching any objects.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-27 13:47:15 -07:00
Jeff King 0e15ad9b73 add tests for cloning corrupted repositories
We try not to let corruption pass unnoticed over fetches and
clones. For the most part, this works, but there are some
broken corner cases, including:

  1. We do not detect missing objects over git-aware
     transports. This is a little hard to test, because the
     sending side will actually complain about the missing
     object.

     To fool it, we corrupt a repository such that we have a
     "misnamed" object: it claims to be sha1 X, but is
     really Y. This lets the sender blindly transmit it, but
     it is the receiver's responsibility to verify that what
     it got is sane (and it does not).

  2. We do not detect missing or misnamed blobs during the
     checkout phase of clone.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-27 13:47:11 -07:00
Jeff King d9c31e14d0 streaming_write_entry: propagate streaming errors
When we are streaming an index blob to disk, we store the
error from stream_blob_to_fd in the "result" variable, and
then immediately overwrite that with the return value of
"close". That means we catch errors on close (e.g., problems
committing the file to disk), but miss anything which
happened before then.

We can fix this by using bitwise-OR to accumulate errors in
our result variable.

While we're here, we can also simplify the error handling
with an early return, which makes it easier to see under
which circumstances we need to clean up.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-27 13:47:09 -07:00
Jeff King 7b6257b0d4 add test for streaming corrupt blobs
We do not have many tests for handling corrupt objects. This
new test at least checks that we detect a byte error in a
corrupt blob object while streaming it out with cat-file.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-27 13:47:06 -07:00