git/pack-objects.h
Jeff King 4cf2143e02 pack-objects: break delta cycles before delta-search phase
We do not allow cycles in the delta graph of a pack (i.e., A
is a delta of B which is a delta of A) for the obvious
reason that you cannot actually access any of the objects in
such a case.

There's a last-ditch attempt to notice cycles during the
write phase, during which we issue a warning to the user and
write one of the objects out in full. However, this is
"last-ditch" for two reasons:

  1. By this time, it's too late to find another delta for
     the object, so the resulting pack is larger than it
     otherwise could be.

  2. The warning is there because this is something that
     _shouldn't_ ever happen. If it does, then either:

       a. a pack we are reusing deltas from had its own
          cycle

       b. we are reusing deltas from multiple packs, and
          we found a cycle among them (i.e., A is a delta of
          B in one pack, but B is a delta of A in another,
          and we choose to use both deltas).

       c. there is a bug in the delta-search code

     So this code serves as a final check that none of these
     things has happened, warns the user, and prevents us
     from writing a bogus pack.

Right now, (2b) should never happen because of the static
ordering of packs in want_object_in_pack(). If two objects
have a delta relationship, then they must be in the same
pack, and therefore we will find them from that same pack.

However, a future patch would like to change that static
ordering, which will make (2b) a common occurrence. In
preparation, we should be able to handle those kinds of
cycles better. This patch does by introducing a
cycle-breaking step during the get_object_details() phase,
when we are deciding which deltas can be reused. That gives
us the chance to feed the objects into the delta search as
if the cycle did not exist.

We'll leave the detection and warning in the write_object()
phase in place, as it still serves as a check for case (2c).

This does mean we will stop warning for (2a). That case is
caused by bogus input packs, and we ideally would warn the
user about it.  However, since those cycles show up after
picking reusable deltas, they look the same as (2b) to us;
our new code will break the cycles early and the last-ditch
check will never see them.

We could do analysis on any cycles that we find to
distinguish the two cases (i.e., it is a bogus pack if and
only if every delta in the cycle is in the same pack), but
we don't need to. If there is a cycle inside a pack, we'll
run into problems not only reusing the delta, but accessing
the object data at all. So when we try to dig up the actual
size of the object, we'll hit that same cycle and kick in
our usual complain-and-try-another-source code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 10:44:13 -07:00

77 lines
2 KiB
C

#ifndef PACK_OBJECTS_H
#define PACK_OBJECTS_H
struct object_entry {
struct pack_idx_entry idx;
unsigned long size; /* uncompressed size */
struct packed_git *in_pack; /* already in pack */
off_t in_pack_offset;
struct object_entry *delta; /* delta base object */
struct object_entry *delta_child; /* deltified objects who bases me */
struct object_entry *delta_sibling; /* other deltified objects who
* uses the same base as me
*/
void *delta_data; /* cached delta (uncompressed) */
unsigned long delta_size; /* delta data size (uncompressed) */
unsigned long z_delta_size; /* delta data size (compressed) */
enum object_type type;
enum object_type in_pack_type; /* could be delta */
uint32_t hash; /* name hint hash */
unsigned int in_pack_pos;
unsigned char in_pack_header_size;
unsigned preferred_base:1; /*
* we do not pack this, but is available
* to be used as the base object to delta
* objects against.
*/
unsigned no_try_delta:1;
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
/*
* State flags for depth-first search used for analyzing delta cycles.
*/
enum {
DFS_NONE = 0,
DFS_ACTIVE,
DFS_DONE
} dfs_state;
};
struct packing_data {
struct object_entry *objects;
uint32_t nr_objects, nr_alloc;
int32_t *index;
uint32_t index_size;
};
struct object_entry *packlist_alloc(struct packing_data *pdata,
const unsigned char *sha1,
uint32_t index_pos);
struct object_entry *packlist_find(struct packing_data *pdata,
const unsigned char *sha1,
uint32_t *index_pos);
static inline uint32_t pack_name_hash(const char *name)
{
uint32_t c, hash = 0;
if (!name)
return 0;
/*
* This effectively just creates a sortable number from the
* last sixteen non-whitespace characters. Last characters
* count "most", so things that end in ".c" sort together.
*/
while ((c = *name++) != 0) {
if (isspace(c))
continue;
hash = (hash >> 2) + (c << 24);
}
return hash;
}
#endif