Commit graph

27 commits

Author SHA1 Message Date
Jeff King 1190a1acf8 pack-objects: name pack files after trailer hash
Our current scheme for naming packfiles is to calculate the
sha1 hash of the sorted list of objects contained in the
packfile. This gives us a unique name, so we are reasonably
sure that two packs with the same name will contain the same
objects.

It does not, however, tell us that two such packs have the
exact same bytes. This makes things awkward if we repack the
same set of objects. Due to run-to-run variations, the bytes
may not be identical (e.g., changed zlib or git versions,
different source object reuse due to new packs in the
repository, or even different deltas due to races during a
multi-threaded delta search).

In theory, this could be helpful to a program that cares
that the packfile contains a certain set of objects, but
does not care about the particular representation. In
practice, no part of git makes use of that, and in many
cases it is potentially harmful. For example, if a dumb http
client fetches the .idx file, it must be sure to get the
exact .pack that matches it. Similarly, a partial transfer
of a .pack file cannot be safely resumed, as the actual
bytes may have changed.  This could also affect a local
client which opened the .idx and .pack files, closes the
.pack file (due to memory or file descriptor limits), and
then re-opens a changed packfile.

In all of these cases, git can detect the problem, as we
have the sha1 of the bytes themselves in the pack trailer
(which we verify on transfer), and the .idx file references
the trailer from the matching packfile. But it would be
simpler and more efficient to actually get the correct
bytes, rather than noticing the problem and having to
restart the operation.

This patch simply uses the pack trailer sha1 as the pack
name. It should be similarly unique, but covers the exact
representation of the objects. Other parts of git should not
care, as the pack name is returned by pack-objects and is
essentially opaque.

One test needs to be updated, because it actually corrupts a
pack and expects that re-packing the corrupted bytes will
use the same name. It won't anymore, but we can easily just
use the name that pack-objects hands back.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05 15:40:11 -08:00
Nguyễn Thái Ngọc Duy 6a301345a5 pack-objects: do not accept "--index-version=version,"
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-01 13:03:46 -08:00
Junio C Hamano 3de89c9d42 verify-pack: use index-pack --verify
This finally gets rid of the inefficient verify-pack implementation that
walks objects in the packfile in their object name order and replaces it
with a call to index-pack --verify. As a side effect, it also removes
packed_object_info_detail() API which is rather expensive.

As this changes the way errors are reported (verify-pack used to rely on
the usual runtime error detection routine unpack_entry() to diagnose the
CRC errors in an entry in the *.idx file; index-pack --verify checks the
whole *.idx file in one go), update a test that expected the string "CRC"
to appear in the error message.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-05 22:45:38 -07:00
Junio C Hamano 3c9fc074c2 index-pack --verify: read anomalous offsets from v2 idx file
A pack v2 .idx file usually records offset using 64-bit representation
only when the offset does not fit within 31-bit, but you can handcraft
your .idx file to record smaller offset using 64-bit, storing all zero
in the upper 4-byte.  By inspecting the original idx file when running
index-pack --verify, encode such low offsets that do not need to be in
64-bit but are encoded using 64-bit just like the original idx file so
that we can still validate the pack/idx pair by comparing the idx file
recomputed with the original.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-27 23:29:03 -08:00
Junio C Hamano e337a04de2 index-pack: --verify
Given an existing .pack file and the .idx file that describes it,
this new mode of operation reads and re-index the packfile and makes
sure the existing .idx file matches the result byte-for-byte.

All the objects in the .pack file are validated during this operation as
well.  Unlike verify-pack, which visits each object described in the .idx
file in the SHA-1 order, index-pack efficiently exploits the delta-chain
to avoid rebuilding the objects that are used as the base of deltified
objects over and over again while validating the objects, resulting in
much quicker verification of the .pack file and its .idx file.

This version however cannot verify a .pack/.idx pair with a handcrafted v2
index that uses 64-bit offset representation for offsets that would fit
within 31-bit. You can create such an .idx file by giving a custom offset
to --index-version option to the command.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-27 23:29:03 -08:00
Jonathan Nieder a48fcd8369 tests: add missing &&
Breaks in a test assertion's && chain can potentially hide
failures from earlier commands in the chain.

Commands intended to fail should be marked with !, test_must_fail, or
test_might_fail.  The examples in this patch do not require that.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-09 11:59:49 -08:00
Ævar Arnfjörð Bjarmason fadb5156e4 tests: Skip tests in a way that makes sense under TAP
SKIP messages are now part of the TAP plan. A TAP harness now knows
why a particular test was skipped and can report that information. The
non-TAP harness built into Git's test-lib did nothing special with
these messages, and is unaffected by these changes.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-06-25 10:08:20 -07:00
Johannes Sixt 18bf879817 t5302: Use prerequisite tags to skip 64-bit offset tests
The effects of this patch can be tested on Linux by commenting out

  #define _FILE_OFFSET_BITS 64

in git-compat-util.h.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
2009-03-22 17:26:44 +01:00
Johannes Sixt b689ccf6c9 t5300, t5302, t5303: Do not use /dev/zero
We do not have /dev/zero on Windows. This replaces it by data generated
with printf, perl, or echo. Most of the cases do not depend on that the
data is a stream of zero bytes, so we use something printable; nor is an
unlimited stream of data needed, so we produce only as many bytes as the
test cases need.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
2009-03-19 21:47:15 +01:00
Johannes Schindelin 1415be8f0f Force t5302 to use a single thread
If the packs are made using multiple threads, they are no longer identical
on the 4-core Xeon I tested on.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-12-15 21:54:12 -08:00
Junio C Hamano 7b51b77dbc Merge branch 'np/pack-safer'
* np/pack-safer:
  t5303: fix printf format string for portability
  t5303: work around printf breakage in dash
  pack-objects: don't leak pack window reference when splitting packs
  extend test coverage for latest pack corruption resilience improvements
  pack-objects: allow "fixing" a corrupted pack without a full repack
  make find_pack_revindex() aware of the nasty world
  make check_object() resilient to pack corruptions
  make packed_object_info() resilient to pack corruptions
  make unpack_object_header() non fatal
  better validation on delta base object offsets
  close another possibility for propagating pack corruption
2008-11-12 22:26:35 -08:00
Junio C Hamano 275ee50c81 Merge branch 'np/index-pack'
* np/index-pack:
  index-pack: don't leak leaf delta result
  improve index-pack tests
  fix multiple issues in index-pack
  index-pack: smarter memory usage during delta resolution
  index-pack: rationalize delta resolution code
2008-11-02 16:36:37 -08:00
Nicolas Pitre 0e8189e270 close another possibility for propagating pack corruption
Abstract
--------

With index v2 we have a per object CRC to allow quick and safe reuse of
pack data when repacking.  This, however, doesn't currently prevent a
stealth corruption from being propagated into a new pack when _not_
reusing pack data as demonstrated by the modification to t5302 included
here.

The Context
-----------

The Git database is all checksummed with SHA1 hashes.  Any kind of
corruption can be confirmed by verifying this per object hash against
corresponding data.  However this can be costly to perform systematically
and therefore this check is often not performed at run time when
accessing the object database.

First, the loose object format is entirely compressed with zlib which
already provide a CRC verification of its own when inflating data.  Any
disk corruption would be caught already in this case.

Then, packed objects are also compressed with zlib but only for their
actual payload.  The object headers and delta base references are not
deflated for obvious performance reasons, however this leave them
vulnerable to potentially undetected disk corruptions.  Object types
are often validated against the expected type when they're requested,
and deflated size must always match the size recorded in the object header,
so those cases are pretty much covered as well.

Where corruptions could go unnoticed is in the delta base reference.
Of course, in the OBJ_REF_DELTA case,  the odds for a SHA1 reference to
get corrupted so it actually matches the SHA1 of another object with the
same size (the delta header stores the expected size of the base object
to apply against) are virtually zero.  In the OBJ_OFS_DELTA case, the
reference is a pack offset which would have to match the start boundary
of a different base object but still with the same size, and although this
is relatively much more "probable" than in the OBJ_REF_DELTA case, the
probability is also about zero in absolute terms.  Still, the possibility
exists as demonstrated in t5302 and is certainly greater than a SHA1
collision, especially in the OBJ_OFS_DELTA case which is now the default
when repacking.

Again, repacking by reusing existing pack data is OK since the per object
CRC provided by index v2 guards against any such corruptions. What t5302
failed to test is a full repack in such case.

The Solution
------------

As unlikely as this kind of stealth corruption can be in practice, it
certainly isn't acceptable to propagate it into a freshly created pack.
But, because this is so unlikely, we don't want to pay the run time cost
associated with extra validation checks all the time either.  Furthermore,
consequences of such corruption in anything but repacking should be rather
visible, and even if it could be quite unpleasant, it still has far less
severe consequences than actively creating bad packs.

So the best compromize is to check packed object CRC when unpacking
objects, and only during the compression/writing phase of a repack, and
only when not streaming the result.  The cost of this is minimal (less
than 1% CPU time), and visible only with a full repack.

Someone with a stats background could provide an objective evaluation of
this, but I suspect that it's bad RAM that has more potential for data
corruptions at this point, even in those cases where this extra check
is not performed.  Still, it is best to prevent a known hole for
corruption when recreating object data into a new pack.

What about the streamed pack case?  Well, any client receiving a pack
must always consider that pack as untrusty and perform full validation
anyway, hence no such stealth corruption could be propagated to remote
repositoryes already.  It is therefore worthless doing local validation
in that case.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-02 15:22:15 -08:00
Nicolas Pitre 2b5c208f5b improve index-pack tests
Commit 9441b61dc5 introduced serious bugs in index-pack which are
described and fixed by commit ce3f6dc655.  However, despite the
boldness of those bugs, the test suite still passed.

This improves t5302-pack-index.sh so to ensure a much better code
path coverage.  With commit ce3f6dc655 reverted, 17 of the 26 tests
do fail now.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-10-22 18:08:58 -07:00
Nicolas Pitre a672ea6ac5 rehabilitate 'git index-pack' inside the object store
Before commit d0b92a3f6e it was possible to run 'git index-pack'
directly in the .git/objects/pack/ directory.  Restore that ability.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-10-21 13:20:03 -07:00
Nanako Shiraishi 3604e7c5c6 tests: use "git xyzzy" form (t3600 - t6999)
Converts tests between t3600-t6300.

Signed-off-by: Nanako Shiraishi <nanako3@lavabit.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-09-03 14:13:59 -07:00
Stephan Beyer d492b31caf t/: Use "test_must_fail git" instead of "! git"
This patch changes every occurrence of "! git" -- with the meaning
that a git call has to gracefully fail -- into "test_must_fail git".

This is useful to

 - make sure the test does not fail because of a signal,
   e.g. SIGSEGV, and

 - advertise the use of "test_must_fail" for new tests.

Signed-off-by: Stephan Beyer <s-beyer@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-13 13:21:26 -07:00
Nicolas Pitre 85fe23ed2a verify-pack: test for detection of index v2 object CRC mismatch
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-24 23:58:57 -07:00
Jeff King bbf08124e0 fix bsd shell negation
On some shells (notably /bin/sh on FreeBSD 6.1), the
construct

  foo && ! bar | baz

is true if

  foo && baz

whereas for most other shells (such as bash) is true if

  foo && ! baz

We can work around this by specifying

  foo && ! (bar | baz)

which works everywhere.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-13 21:44:48 -07:00
Jeff King b4ce54fc61 remove use of "tail -n 1" and "tail -1"
The "-n" syntax is not supported by System V versions of
tail (which prefer "tail -1"). Unfortunately "tail -1" is
not actually POSIX.  We had some of both forms in our
scripts.

Since neither form works everywhere, this patch replaces
both with the equivalent sed invocation:

  sed -ne '$p'

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-13 00:57:52 -07:00
Junio C Hamano 41ac414ea2 Sane use of test_expect_failure
Originally, test_expect_failure was designed to be the opposite
of test_expect_success, but this was a bad decision.  Most tests
run a series of commands that leads to the single command that
needs to be tested, like this:

    test_expect_{success,failure} 'test title' '
	setup1 &&
        setup2 &&
        setup3 &&
        what is to be tested
    '

And expecting a failure exit from the whole sequence misses the
point of writing tests.  Your setup$N that are supposed to
succeed may have failed without even reaching what you are
trying to test.  The only valid use of test_expect_failure is to
check a trivial single command that is expected to fail, which
is a minority in tests of Porcelain-ish commands.

This large-ish patch rewrites all uses of test_expect_failure to
use test_expect_success and rewrites the condition of what is
tested, like this:

    test_expect_success 'test title' '
	setup1 &&
        setup2 &&
        setup3 &&
        ! this command should fail
    '

test_expect_failure is redefined to serve as a reminder that
that test *should* succeed but due to a known breakage in git it
currently does not pass.  So if git-foo command should create a
file 'bar' but you discovered a bug that it doesn't, you can
write a test like this:

    test_expect_failure 'git-foo should create bar' '
        rm -f bar &&
        git foo &&
        test -f bar
    '

This construct acts similar to test_expect_success, but instead
of reporting "ok/FAIL" like test_expect_success does, the
outcome is reported as "FIXED/still broken".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-01 20:49:34 -08:00
Nicolas Pitre 5f9ffff308 rehabilitate some t5302 tests on 32-bit off_t machines
Commit 8ed2fca458 was a bit draconian in
skipping certain tests which should be perfectly valid even on platform
with a 32-bit off_t.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-15 21:18:07 -08:00
Johannes Sixt 8ed2fca458 t5302-pack-index: Skip tests of 64-bit offsets if necessary.
There are platforms where off_t is not 64 bits wide. In this case many tests
are doomed to fail. Let's skip them.

Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-14 15:18:39 -08:00
Junio C Hamano 5be60078c9 Rewrite "git-frotz" to "git frotz"
This uses the remove-dashes target to replace "git-frotz" to "git frotz".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-07-02 22:52:14 -07:00
Shawn O. Pearce b3431bc603 Don't use seq in tests, not everyone has it
For example Mac OS X lacks the seq command.  So we cannot use it
there.  A good old while loop works just as good.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-05-02 13:24:23 -04:00
Junio C Hamano bd4b0aeb1f t5302: avoid using tail -c
A Large Angry SCM (gitzilla) noticed that on an unnamed platform, tail -c
wants its byte count as part of the option, not as a separate argument.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-23 22:05:22 -07:00
Nicolas Pitre 6e5417769c tests for various pack index features
This is a fairly complete list of tests for various aspects of pack
index versions 1 and  2.

Tests on index v2 include 32-bit and 64-bit offsets, as well as a nice
demonstration of the flawed repacking integrity checks that index
version 2 intend to solve over index version 1 with the per object CRC.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-11 19:32:03 -07:00