Commit graph

6 commits

Author SHA1 Message Date
Junio C Hamano a49dd05fd0 pack-objects: reuse data from existing packs.
When generating a new pack, notice if we have already needed
objects in existing packs.  If an object is stored deltified,
and its base object is also what we are going to pack, then
reuse the existing deltified representation unconditionally,
bypassing all the expensive find_deltas() and try_deltas()
calls.

Also, notice if what we are going to write out exactly match
what is already in an existing pack (either deltified or just
compressed).  In such a case, we can just copy it instead of
going through the usual uncompressing & recompressing cycle.

Without this patch, in linux-2.6 repository with about 1500
loose objects and a single mega pack:

    $ git-rev-list --objects v2.6.16-rc3 >RL
    $ wc -l RL
    184141 RL
    $ time git-pack-objects p <RL
    Generating pack...
    Done counting 184141 objects.
    Packing 184141 objects....................
    a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2

    real    12m4.323s
    user    11m2.560s
    sys     0m55.950s

With this patch, the same input:

    $ time ../git.junio/git-pack-objects q <RL
    Generating pack...
    Done counting 184141 objects.
    Packing 184141 objects.....................
    a1fc7b3e537fcb9b3c46b7505df859f0a11e79d2
    Total 184141, written 184141, reused 182441

    real    1m2.608s
    user    0m55.090s
    sys     0m1.830s

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-17 02:11:38 -08:00
Nicolas Pitre d60fc1c864 remove delta-against-self bit
After experimenting with code to add the ability to encode a delta
against part of the deltified file, it turns out that resulting packs
are _bigger_ than when this ability is not used.  The raw delta output
might be smaller, but it doesn't compress as well using gzip with a
negative net saving on average.

Said bit would in fact be more useful to allow for encoding the copying
of chunks larger than 64KB providing more savings with large files.
This will correspond to packs version 3.

While the current code still produces packs version 2, it is made future
proof so pack versions 2 and 3 are accepted.  Any pack version 2 are
compatible with version 3 since the redefined bit was never used before.
When enough time has passed, code to use that bit to produce version 3
packs could be added.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-09 21:06:38 -08:00
Junio C Hamano f3bf922409 [PATCH] verify-pack updates.
Nico pointed out that having verify_pack.c and verify-pack.c was
confusing.  Rename verify_pack.c to pack-check.c as suggested,
and enhances the verification done quite a bit.

 - Built-in sha1_file unpacking knows that a base object of a
   deltified object _must_ be in the same pack, and takes
   advantage of that fact.

 - Earlier verify-pack command only checked the SHA1 sum for the
   entire pack file and did not look into its contents.  It now
   checks everything idx file claims to have unpacks correctly.

 - It now has a hook to give more detailed information for
   objects contained in the pack under -v flag.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-30 22:33:47 -07:00
Junio C Hamano f9253394a2 [PATCH] Add git-verify-pack command.
Given a list of <pack>.idx files, this command validates the
index file and the corresponding .pack file for consistency.

This patch also uses the same validation mechanism in fsck-cache
when the --full flag is used.

During normal operation, sha1_file.c verifies that a given .idx
file matches the .pack file by comparing the SHA1 checksum
stored in .idx file and .pack file as a minimum sanity check.
We may further want to check the pack signature and version when
we map the pack, but that would be a separate patch.

Earlier, errors to map a pack file was not flagged fatal but led
to a random fatal error later.  This version explicitly die()s
when such an error is detected.

Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-29 09:11:39 -07:00
Linus Torvalds 01247d8742 Make git pack files use little-endian size encoding
This makes it match the new delta encoding, and admittedly makes the
code easier to follow.

This also updates the PACK file version to 2, since this (and the delta
encoding change in the previous commit) are incompatible with the old
format.
2005-06-28 22:15:57 -07:00
Linus Torvalds a733cb606f Change pack file format. Hopefully for the last time.
This also adds a header with a signature, version info, and the number
of objects to the pack file.  It also encodes the file length and type
more efficiently.
2005-06-28 14:21:02 -07:00