development/git - HydraGit

mirror of https://github.com/git/git synced 2024-08-24 18:26:02 +00:00

Author	SHA1	Message	Date
Jeff King	1190a1acf8	pack-objects: name pack files after trailer hash Our current scheme for naming packfiles is to calculate the sha1 hash of the sorted list of objects contained in the packfile. This gives us a unique name, so we are reasonably sure that two packs with the same name will contain the same objects. It does not, however, tell us that two such packs have the exact same bytes. This makes things awkward if we repack the same set of objects. Due to run-to-run variations, the bytes may not be identical (e.g., changed zlib or git versions, different source object reuse due to new packs in the repository, or even different deltas due to races during a multi-threaded delta search). In theory, this could be helpful to a program that cares that the packfile contains a certain set of objects, but does not care about the particular representation. In practice, no part of git makes use of that, and in many cases it is potentially harmful. For example, if a dumb http client fetches the .idx file, it must be sure to get the exact .pack that matches it. Similarly, a partial transfer of a .pack file cannot be safely resumed, as the actual bytes may have changed. This could also affect a local client which opened the .idx and .pack files, closes the .pack file (due to memory or file descriptor limits), and then re-opens a changed packfile. In all of these cases, git can detect the problem, as we have the sha1 of the bytes themselves in the pack trailer (which we verify on transfer), and the .idx file references the trailer from the matching packfile. But it would be simpler and more efficient to actually get the correct bytes, rather than noticing the problem and having to restart the operation. This patch simply uses the pack trailer sha1 as the pack name. It should be similarly unique, but covers the exact representation of the objects. Other parts of git should not care, as the pack name is returned by pack-objects and is essentially opaque. One test needs to be updated, because it actually corrupts a pack and expects that re-packing the corrupted bytes will use the same name. It won't anymore, but we can easily just use the name that pack-objects hands back. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-05 15:40:11 -08:00
Nguyễn Thái Ngọc Duy	6a301345a5	pack-objects: do not accept "--index-version=version," Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-01 13:03:46 -08:00
Junio C Hamano	3de89c9d42	verify-pack: use index-pack --verify This finally gets rid of the inefficient verify-pack implementation that walks objects in the packfile in their object name order and replaces it with a call to index-pack --verify. As a side effect, it also removes packed_object_info_detail() API which is rather expensive. As this changes the way errors are reported (verify-pack used to rely on the usual runtime error detection routine unpack_entry() to diagnose the CRC errors in an entry in the .idx file; index-pack --verify checks the whole .idx file in one go), update a test that expected the string "CRC" to appear in the error message. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-06-05 22:45:38 -07:00
Junio C Hamano	3c9fc074c2	index-pack --verify: read anomalous offsets from v2 idx file A pack v2 .idx file usually records offset using 64-bit representation only when the offset does not fit within 31-bit, but you can handcraft your .idx file to record smaller offset using 64-bit, storing all zero in the upper 4-byte. By inspecting the original idx file when running index-pack --verify, encode such low offsets that do not need to be in 64-bit but are encoded using 64-bit just like the original idx file so that we can still validate the pack/idx pair by comparing the idx file recomputed with the original. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-02-27 23:29:03 -08:00
Junio C Hamano	e337a04de2	index-pack: --verify Given an existing .pack file and the .idx file that describes it, this new mode of operation reads and re-index the packfile and makes sure the existing .idx file matches the result byte-for-byte. All the objects in the .pack file are validated during this operation as well. Unlike verify-pack, which visits each object described in the .idx file in the SHA-1 order, index-pack efficiently exploits the delta-chain to avoid rebuilding the objects that are used as the base of deltified objects over and over again while validating the objects, resulting in much quicker verification of the .pack file and its .idx file. This version however cannot verify a .pack/.idx pair with a handcrafted v2 index that uses 64-bit offset representation for offsets that would fit within 31-bit. You can create such an .idx file by giving a custom offset to --index-version option to the command. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-02-27 23:29:03 -08:00
Jonathan Nieder	a48fcd8369	tests: add missing && Breaks in a test assertion's && chain can potentially hide failures from earlier commands in the chain. Commands intended to fail should be marked with !, test_must_fail, or test_might_fail. The examples in this patch do not require that. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-11-09 11:59:49 -08:00
Ævar Arnfjörð Bjarmason	fadb5156e4	tests: Skip tests in a way that makes sense under TAP SKIP messages are now part of the TAP plan. A TAP harness now knows why a particular test was skipped and can report that information. The non-TAP harness built into Git's test-lib did nothing special with these messages, and is unaffected by these changes. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-06-25 10:08:20 -07:00
Johannes Sixt	18bf879817	t5302: Use prerequisite tags to skip 64-bit offset tests The effects of this patch can be tested on Linux by commenting out #define _FILE_OFFSET_BITS 64 in git-compat-util.h. Signed-off-by: Johannes Sixt <j6t@kdbg.org>	2009-03-22 17:26:44 +01:00
Johannes Sixt	b689ccf6c9	t5300, t5302, t5303: Do not use /dev/zero We do not have /dev/zero on Windows. This replaces it by data generated with printf, perl, or echo. Most of the cases do not depend on that the data is a stream of zero bytes, so we use something printable; nor is an unlimited stream of data needed, so we produce only as many bytes as the test cases need. Signed-off-by: Johannes Sixt <j6t@kdbg.org>	2009-03-19 21:47:15 +01:00
Johannes Schindelin	1415be8f0f	Force t5302 to use a single thread If the packs are made using multiple threads, they are no longer identical on the 4-core Xeon I tested on. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Acked-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-12-15 21:54:12 -08:00
Junio C Hamano	7b51b77dbc	Merge branch 'np/pack-safer' * np/pack-safer: t5303: fix printf format string for portability t5303: work around printf breakage in dash pack-objects: don't leak pack window reference when splitting packs extend test coverage for latest pack corruption resilience improvements pack-objects: allow "fixing" a corrupted pack without a full repack make find_pack_revindex() aware of the nasty world make check_object() resilient to pack corruptions make packed_object_info() resilient to pack corruptions make unpack_object_header() non fatal better validation on delta base object offsets close another possibility for propagating pack corruption	2008-11-12 22:26:35 -08:00
Junio C Hamano	275ee50c81	Merge branch 'np/index-pack' * np/index-pack: index-pack: don't leak leaf delta result improve index-pack tests fix multiple issues in index-pack index-pack: smarter memory usage during delta resolution index-pack: rationalize delta resolution code	2008-11-02 16:36:37 -08:00
Nicolas Pitre	0e8189e270	close another possibility for propagating pack corruption Abstract -------- With index v2 we have a per object CRC to allow quick and safe reuse of pack data when repacking. This, however, doesn't currently prevent a stealth corruption from being propagated into a new pack when _not_ reusing pack data as demonstrated by the modification to t5302 included here. The Context ----------- The Git database is all checksummed with SHA1 hashes. Any kind of corruption can be confirmed by verifying this per object hash against corresponding data. However this can be costly to perform systematically and therefore this check is often not performed at run time when accessing the object database. First, the loose object format is entirely compressed with zlib which already provide a CRC verification of its own when inflating data. Any disk corruption would be caught already in this case. Then, packed objects are also compressed with zlib but only for their actual payload. The object headers and delta base references are not deflated for obvious performance reasons, however this leave them vulnerable to potentially undetected disk corruptions. Object types are often validated against the expected type when they're requested, and deflated size must always match the size recorded in the object header, so those cases are pretty much covered as well. Where corruptions could go unnoticed is in the delta base reference. Of course, in the OBJ_REF_DELTA case, the odds for a SHA1 reference to get corrupted so it actually matches the SHA1 of another object with the same size (the delta header stores the expected size of the base object to apply against) are virtually zero. In the OBJ_OFS_DELTA case, the reference is a pack offset which would have to match the start boundary of a different base object but still with the same size, and although this is relatively much more "probable" than in the OBJ_REF_DELTA case, the probability is also about zero in absolute terms. Still, the possibility exists as demonstrated in t5302 and is certainly greater than a SHA1 collision, especially in the OBJ_OFS_DELTA case which is now the default when repacking. Again, repacking by reusing existing pack data is OK since the per object CRC provided by index v2 guards against any such corruptions. What t5302 failed to test is a full repack in such case. The Solution ------------ As unlikely as this kind of stealth corruption can be in practice, it certainly isn't acceptable to propagate it into a freshly created pack. But, because this is so unlikely, we don't want to pay the run time cost associated with extra validation checks all the time either. Furthermore, consequences of such corruption in anything but repacking should be rather visible, and even if it could be quite unpleasant, it still has far less severe consequences than actively creating bad packs. So the best compromize is to check packed object CRC when unpacking objects, and only during the compression/writing phase of a repack, and only when not streaming the result. The cost of this is minimal (less than 1% CPU time), and visible only with a full repack. Someone with a stats background could provide an objective evaluation of this, but I suspect that it's bad RAM that has more potential for data corruptions at this point, even in those cases where this extra check is not performed. Still, it is best to prevent a known hole for corruption when recreating object data into a new pack. What about the streamed pack case? Well, any client receiving a pack must always consider that pack as untrusty and perform full validation anyway, hence no such stealth corruption could be propagated to remote repositoryes already. It is therefore worthless doing local validation in that case. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-11-02 15:22:15 -08:00
Nicolas Pitre	2b5c208f5b	improve index-pack tests Commit `9441b61dc5` introduced serious bugs in index-pack which are described and fixed by commit `ce3f6dc655`. However, despite the boldness of those bugs, the test suite still passed. This improves t5302-pack-index.sh so to ensure a much better code path coverage. With commit `ce3f6dc655` reverted, 17 of the 26 tests do fail now. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-10-22 18:08:58 -07:00
Nicolas Pitre	a672ea6ac5	rehabilitate 'git index-pack' inside the object store Before commit `d0b92a3f6e` it was possible to run 'git index-pack' directly in the .git/objects/pack/ directory. Restore that ability. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-10-21 13:20:03 -07:00
Nanako Shiraishi	3604e7c5c6	tests: use "git xyzzy" form (t3600 - t6999) Converts tests between t3600-t6300. Signed-off-by: Nanako Shiraishi <nanako3@lavabit.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-09-03 14:13:59 -07:00
Stephan Beyer	d492b31caf	t/: Use "test_must_fail git" instead of "! git" This patch changes every occurrence of "! git" -- with the meaning that a git call has to gracefully fail -- into "test_must_fail git". This is useful to - make sure the test does not fail because of a signal, e.g. SIGSEGV, and - advertise the use of "test_must_fail" for new tests. Signed-off-by: Stephan Beyer <s-beyer@gmx.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-07-13 13:21:26 -07:00
Nicolas Pitre	85fe23ed2a	verify-pack: test for detection of index v2 object CRC mismatch Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-06-24 23:58:57 -07:00
Jeff King	bbf08124e0	fix bsd shell negation On some shells (notably /bin/sh on FreeBSD 6.1), the construct foo && ! bar \| baz is true if foo && baz whereas for most other shells (such as bash) is true if foo && ! baz We can work around this by specifying foo && ! (bar \| baz) which works everywhere. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-05-13 21:44:48 -07:00
Jeff King	b4ce54fc61	remove use of "tail -n 1" and "tail -1" The "-n" syntax is not supported by System V versions of tail (which prefer "tail -1"). Unfortunately "tail -1" is not actually POSIX. We had some of both forms in our scripts. Since neither form works everywhere, this patch replaces both with the equivalent sed invocation: sed -ne '$p' Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-03-13 00:57:52 -07:00
Junio C Hamano	41ac414ea2	Sane use of test_expect_failure Originally, test_expect_failure was designed to be the opposite of test_expect_success, but this was a bad decision. Most tests run a series of commands that leads to the single command that needs to be tested, like this: test_expect_{success,failure} 'test title' ' setup1 && setup2 && setup3 && what is to be tested ' And expecting a failure exit from the whole sequence misses the point of writing tests. Your setup$N that are supposed to succeed may have failed without even reaching what you are trying to test. The only valid use of test_expect_failure is to check a trivial single command that is expected to fail, which is a minority in tests of Porcelain-ish commands. This large-ish patch rewrites all uses of test_expect_failure to use test_expect_success and rewrites the condition of what is tested, like this: test_expect_success 'test title' ' setup1 && setup2 && setup3 && ! this command should fail ' test_expect_failure is redefined to serve as a reminder that that test should succeed but due to a known breakage in git it currently does not pass. So if git-foo command should create a file 'bar' but you discovered a bug that it doesn't, you can write a test like this: test_expect_failure 'git-foo should create bar' ' rm -f bar && git foo && test -f bar ' This construct acts similar to test_expect_success, but instead of reporting "ok/FAIL" like test_expect_success does, the outcome is reported as "FIXED/still broken". Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-02-01 20:49:34 -08:00
Nicolas Pitre	5f9ffff308	rehabilitate some t5302 tests on 32-bit off_t machines Commit `8ed2fca458` was a bit draconian in skipping certain tests which should be perfectly valid even on platform with a 32-bit off_t. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2007-11-15 21:18:07 -08:00
Johannes Sixt	8ed2fca458	t5302-pack-index: Skip tests of 64-bit offsets if necessary. There are platforms where off_t is not 64 bits wide. In this case many tests are doomed to fail. Let's skip them. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2007-11-14 15:18:39 -08:00
Junio C Hamano	5be60078c9	Rewrite "git-frotz" to "git frotz" This uses the remove-dashes target to replace "git-frotz" to "git frotz". Signed-off-by: Junio C Hamano <gitster@pobox.com>	2007-07-02 22:52:14 -07:00
Shawn O. Pearce	b3431bc603	Don't use seq in tests, not everyone has it For example Mac OS X lacks the seq command. So we cannot use it there. A good old while loop works just as good. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-05-02 13:24:23 -04:00
Junio C Hamano	bd4b0aeb1f	t5302: avoid using tail -c A Large Angry SCM (gitzilla) noticed that on an unnamed platform, tail -c wants its byte count as part of the option, not as a separate argument. Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-04-23 22:05:22 -07:00
Nicolas Pitre	6e5417769c	tests for various pack index features This is a fairly complete list of tests for various aspects of pack index versions 1 and 2. Tests on index v2 include 32-bit and 64-bit offsets, as well as a nice demonstration of the flawed repacking integrity checks that index version 2 intend to solve over index version 1 with the per object CRC. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-04-11 19:32:03 -07:00

27 commits