Merge branch 'cc/delta-islands'

Lift code from GitHub to restrict delta computation so that an
object that exists in one fork is not made into a delta against
another object that does not appear in the same forked repository.

* cc/delta-islands:
  pack-objects: move 'layer' into 'struct packing_data'
  pack-objects: move tree_depth into 'struct packing_data'
  t5320: tests for delta islands
  repack: add delta-islands support
  pack-objects: add delta-islands support
  pack-objects: refactor code into compute_layer_order()
  Add delta-islands.{c,h}
This commit is contained in:
Junio C Hamano 2018-09-17 13:53:55 -07:00
commit f3504ea3dd
11 changed files with 936 additions and 47 deletions

View file

@ -2684,6 +2684,21 @@ Note that changing the compression level will not automatically recompress
all existing objects. You can force recompression by passing the -F option
to linkgit:git-repack[1].
pack.island::
An extended regular expression configuring a set of delta
islands. See "DELTA ISLANDS" in linkgit:git-pack-objects[1]
for details.
pack.islandCore::
Specify an island name which gets to have its objects be
packed first. This creates a kind of pseudo-pack at the front
of one pack, so that the objects from the specified island are
hopefully faster to copy into any pack that should be served
to a user requesting these objects. In practice this means
that the island specified should likely correspond to what is
the most commonly cloned in the repo. See also "DELTA ISLANDS"
in linkgit:git-pack-objects[1].
pack.deltaCacheSize::
The maximum memory in bytes used for caching deltas in
linkgit:git-pack-objects[1] before writing them out to a pack.
@ -3218,6 +3233,10 @@ repack.packKeptObjects::
index is being written (either via `--write-bitmap-index` or
`repack.writeBitmaps`).
repack.useDeltaIslands::
If set to true, makes `git repack` act as if `--delta-islands`
was passed. Defaults to `false`.
repack.writeBitmaps::
When true, git will write a bitmap index when packing all
objects to disk (e.g., when `git repack -a` is run). This

View file

@ -289,6 +289,103 @@ Unexpected missing object will raise an error.
--unpack-unreachable::
Keep unreachable objects in loose form. This implies `--revs`.
--delta-islands::
Restrict delta matches based on "islands". See DELTA ISLANDS
below.
DELTA ISLANDS
-------------
When possible, `pack-objects` tries to reuse existing on-disk deltas to
avoid having to search for new ones on the fly. This is an important
optimization for serving fetches, because it means the server can avoid
inflating most objects at all and just send the bytes directly from
disk. This optimization can't work when an object is stored as a delta
against a base which the receiver does not have (and which we are not
already sending). In that case the server "breaks" the delta and has to
find a new one, which has a high CPU cost. Therefore it's important for
performance that the set of objects in on-disk delta relationships match
what a client would fetch.
In a normal repository, this tends to work automatically. The objects
are mostly reachable from the branches and tags, and that's what clients
fetch. Any deltas we find on the server are likely to be between objects
the client has or will have.
But in some repository setups, you may have several related but separate
groups of ref tips, with clients tending to fetch those groups
independently. For example, imagine that you are hosting several "forks"
of a repository in a single shared object store, and letting clients
view them as separate repositories through `GIT_NAMESPACE` or separate
repos using the alternates mechanism. A naive repack may find that the
optimal delta for an object is against a base that is only found in
another fork. But when a client fetches, they will not have the base
object, and we'll have to find a new delta on the fly.
A similar situation may exist if you have many refs outside of
`refs/heads/` and `refs/tags/` that point to related objects (e.g.,
`refs/pull` or `refs/changes` used by some hosting providers). By
default, clients fetch only heads and tags, and deltas against objects
found only in those other groups cannot be sent as-is.
Delta islands solve this problem by allowing you to group your refs into
distinct "islands". Pack-objects computes which objects are reachable
from which islands, and refuses to make a delta from an object `A`
against a base which is not present in all of `A`'s islands. This
results in slightly larger packs (because we miss some delta
opportunities), but guarantees that a fetch of one island will not have
to recompute deltas on the fly due to crossing island boundaries.
When repacking with delta islands the delta window tends to get
clogged with candidates that are forbidden by the config. Repacking
with a big --window helps (and doesn't take as long as it otherwise
might because we can reject some object pairs based on islands before
doing any computation on the content).
Islands are configured via the `pack.island` option, which can be
specified multiple times. Each value is a left-anchored regular
expressions matching refnames. For example:
-------------------------------------------
[pack]
island = refs/heads/
island = refs/tags/
-------------------------------------------
puts heads and tags into an island (whose name is the empty string; see
below for more on naming). Any refs which do not match those regular
expressions (e.g., `refs/pull/123`) is not in any island. Any object
which is reachable only from `refs/pull/` (but not heads or tags) is
therefore not a candidate to be used as a base for `refs/heads/`.
Refs are grouped into islands based on their "names", and two regexes
that produce the same name are considered to be in the same
island. The names are computed from the regexes by concatenating any
capture groups from the regex, with a '-' dash in between. (And if
there are no capture groups, then the name is the empty string, as in
the above example.) This allows you to create arbitrary numbers of
islands. Only up to 14 such capture groups are supported though.
For example, imagine you store the refs for each fork in
`refs/virtual/ID`, where `ID` is a numeric identifier. You might then
configure:
-------------------------------------------
[pack]
island = refs/virtual/([0-9]+)/heads/
island = refs/virtual/([0-9]+)/tags/
island = refs/virtual/([0-9]+)/(pull)/
-------------------------------------------
That puts the heads and tags for each fork in their own island (named
"1234" or similar), and the pull refs for each go into their own
"1234-pull".
Note that we pick a single island for each regex to go into, using "last
one wins" ordering (which allows repo-specific config to take precedence
over user-wide config, and so forth).
SEE ALSO
--------
linkgit:git-rev-list[1]

View file

@ -160,6 +160,11 @@ depth is 4095.
being removed. In addition, any unreachable loose objects will
be packed (and their loose counterparts removed).
-i::
--delta-islands::
Pass the `--delta-islands` option to `git-pack-objects`, see
linkgit:git-pack-objects[1].
Configuration
-------------

View file

@ -850,6 +850,7 @@ LIB_OBJS += csum-file.o
LIB_OBJS += ctype.o
LIB_OBJS += date.o
LIB_OBJS += decorate.o
LIB_OBJS += delta-islands.o
LIB_OBJS += diffcore-break.o
LIB_OBJS += diffcore-delta.o
LIB_OBJS += diffcore-order.o

View file

@ -24,6 +24,7 @@
#include "streaming.h"
#include "thread-utils.h"
#include "pack-bitmap.h"
#include "delta-islands.h"
#include "reachable.h"
#include "sha1-array.h"
#include "argv-array.h"
@ -62,6 +63,7 @@ static struct packing_data to_pack;
static struct pack_idx_entry **written_list;
static uint32_t nr_result, nr_written, nr_seen;
static struct bitmap_index *bitmap_git;
static uint32_t write_layer;
static int non_empty;
static int reuse_delta = 1, reuse_object = 1;
@ -97,6 +99,8 @@ static uint16_t write_bitmap_options;
static int exclude_promisor_objects;
static int use_delta_islands;
static unsigned long delta_cache_size = 0;
static unsigned long max_delta_cache_size = DEFAULT_DELTA_CACHE_SIZE;
static unsigned long cache_max_small_delta_size = 1000;
@ -616,7 +620,7 @@ static inline void add_to_write_order(struct object_entry **wo,
unsigned int *endp,
struct object_entry *e)
{
if (e->filled)
if (e->filled || oe_layer(&to_pack, e) != write_layer)
return;
wo[(*endp)++] = e;
e->filled = 1;
@ -676,9 +680,58 @@ static void add_family_to_write_order(struct object_entry **wo,
add_descendants_to_write_order(wo, endp, root);
}
static void compute_layer_order(struct object_entry **wo, unsigned int *wo_end)
{
unsigned int i, last_untagged;
struct object_entry *objects = to_pack.objects;
for (i = 0; i < to_pack.nr_objects; i++) {
if (objects[i].tagged)
break;
add_to_write_order(wo, wo_end, &objects[i]);
}
last_untagged = i;
/*
* Then fill all the tagged tips.
*/
for (; i < to_pack.nr_objects; i++) {
if (objects[i].tagged)
add_to_write_order(wo, wo_end, &objects[i]);
}
/*
* And then all remaining commits and tags.
*/
for (i = last_untagged; i < to_pack.nr_objects; i++) {
if (oe_type(&objects[i]) != OBJ_COMMIT &&
oe_type(&objects[i]) != OBJ_TAG)
continue;
add_to_write_order(wo, wo_end, &objects[i]);
}
/*
* And then all the trees.
*/
for (i = last_untagged; i < to_pack.nr_objects; i++) {
if (oe_type(&objects[i]) != OBJ_TREE)
continue;
add_to_write_order(wo, wo_end, &objects[i]);
}
/*
* Finally all the rest in really tight order
*/
for (i = last_untagged; i < to_pack.nr_objects; i++) {
if (!objects[i].filled && oe_layer(&to_pack, &objects[i]) == write_layer)
add_family_to_write_order(wo, wo_end, &objects[i]);
}
}
static struct object_entry **compute_write_order(void)
{
unsigned int i, wo_end, last_untagged;
uint32_t max_layers = 1;
unsigned int i, wo_end;
struct object_entry **wo;
struct object_entry *objects = to_pack.objects;
@ -709,52 +762,14 @@ static struct object_entry **compute_write_order(void)
*/
for_each_tag_ref(mark_tagged, NULL);
/*
* Give the objects in the original recency order until
* we see a tagged tip.
*/
if (use_delta_islands)
max_layers = compute_pack_layers(&to_pack);
ALLOC_ARRAY(wo, to_pack.nr_objects);
for (i = wo_end = 0; i < to_pack.nr_objects; i++) {
if (objects[i].tagged)
break;
add_to_write_order(wo, &wo_end, &objects[i]);
}
last_untagged = i;
wo_end = 0;
/*
* Then fill all the tagged tips.
*/
for (; i < to_pack.nr_objects; i++) {
if (objects[i].tagged)
add_to_write_order(wo, &wo_end, &objects[i]);
}
/*
* And then all remaining commits and tags.
*/
for (i = last_untagged; i < to_pack.nr_objects; i++) {
if (oe_type(&objects[i]) != OBJ_COMMIT &&
oe_type(&objects[i]) != OBJ_TAG)
continue;
add_to_write_order(wo, &wo_end, &objects[i]);
}
/*
* And then all the trees.
*/
for (i = last_untagged; i < to_pack.nr_objects; i++) {
if (oe_type(&objects[i]) != OBJ_TREE)
continue;
add_to_write_order(wo, &wo_end, &objects[i]);
}
/*
* Finally all the rest in really tight order
*/
for (i = last_untagged; i < to_pack.nr_objects; i++) {
if (!objects[i].filled)
add_family_to_write_order(wo, &wo_end, &objects[i]);
}
for (; write_layer < max_layers; ++write_layer)
compute_layer_order(wo, &wo_end);
if (wo_end != to_pack.nr_objects)
die(_("ordered %u objects, expected %"PRIu32),
@ -1544,7 +1559,8 @@ static void check_object(struct object_entry *entry)
if (base_ref && (
(base_entry = packlist_find(&to_pack, base_ref, NULL)) ||
(thin &&
bitmap_has_sha1_in_uninteresting(bitmap_git, base_ref)))) {
bitmap_has_sha1_in_uninteresting(bitmap_git, base_ref))) &&
in_same_island(&entry->idx.oid, &base_entry->idx.oid)) {
/*
* If base_ref was set above that means we wish to
* reuse delta data, and either we found that object in
@ -1867,6 +1883,11 @@ static int type_size_sort(const void *_a, const void *_b)
return -1;
if (a->preferred_base < b->preferred_base)
return 1;
if (use_delta_islands) {
int island_cmp = island_delta_cmp(&a->idx.oid, &b->idx.oid);
if (island_cmp)
return island_cmp;
}
if (a_size > b_size)
return -1;
if (a_size < b_size)
@ -2027,6 +2048,9 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
if (trg_size < src_size / 32)
return 0;
if (!in_same_island(&trg->entry->idx.oid, &src->entry->idx.oid))
return 0;
/* Load data if not already done */
if (!trg->data) {
read_lock();
@ -2569,6 +2593,9 @@ static void prepare_pack(int window, int depth)
uint32_t i, nr_deltas;
unsigned n;
if (use_delta_islands)
resolve_tree_islands(progress, &to_pack);
get_object_details();
/*
@ -2732,6 +2759,9 @@ static void show_commit(struct commit *commit, void *data)
if (write_bitmap_index)
index_commit_for_bitmap(commit);
if (use_delta_islands)
propagate_island_marks(commit);
}
static void show_object(struct object *obj, const char *name, void *data)
@ -2739,6 +2769,19 @@ static void show_object(struct object *obj, const char *name, void *data)
add_preferred_base_object(name);
add_object_entry(&obj->oid, obj->type, name, 0);
obj->flags |= OBJECT_ADDED;
if (use_delta_islands) {
const char *p;
unsigned depth = 0;
struct object_entry *ent;
for (p = strchr(name, '/'); p; p = strchr(p + 1, '/'))
depth++;
ent = packlist_find(&to_pack, obj->oid.hash, NULL);
if (ent && depth > oe_tree_depth(&to_pack, ent))
oe_set_tree_depth(&to_pack, ent, depth);
}
}
static void show_object__ma_allow_any(struct object *obj, const char *name, void *data)
@ -3064,6 +3107,9 @@ static void get_object_list(int ac, const char **av)
if (use_bitmap_index && !get_object_list_from_bitmap(&revs))
return;
if (use_delta_islands)
load_delta_islands();
if (prepare_revision_walk(&revs))
die(_("revision walk setup failed"));
mark_edges_uninteresting(&revs, show_edge);
@ -3242,6 +3288,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
option_parse_missing_action },
OPT_BOOL(0, "exclude-promisor-objects", &exclude_promisor_objects,
N_("do not pack objects in promisor packfiles")),
OPT_BOOL(0, "delta-islands", &use_delta_islands,
N_("respect islands during delta compression")),
OPT_END(),
};
@ -3368,6 +3416,9 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (pack_to_stdout || !rev_list_all)
write_bitmap_index = 0;
if (use_delta_islands)
argv_array_push(&rp, "--topo-order");
if (progress && all_progress_implied)
progress = 2;

View file

@ -15,6 +15,7 @@
static int delta_base_offset = 1;
static int pack_kept_objects = -1;
static int write_bitmaps;
static int use_delta_islands;
static char *packdir, *packtmp;
static const char *const git_repack_usage[] = {
@ -43,6 +44,10 @@ static int repack_config(const char *var, const char *value, void *cb)
write_bitmaps = git_config_bool(var, value);
return 0;
}
if (!strcmp(var, "repack.usedeltaislands")) {
use_delta_islands = git_config_bool(var, value);
return 0;
}
return git_default_config(var, value, cb);
}
@ -303,6 +308,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
N_("pass --local to git-pack-objects")),
OPT_BOOL('b', "write-bitmap-index", &write_bitmaps,
N_("write bitmap index")),
OPT_BOOL('i', "delta-islands", &use_delta_islands,
N_("pass --delta-islands to git-pack-objects")),
OPT_STRING(0, "unpack-unreachable", &unpack_unreachable, N_("approxidate"),
N_("with -A, do not loosen objects older than this")),
OPT_BOOL('k', "keep-unreachable", &keep_unreachable,
@ -363,6 +370,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
argv_array_push(&cmd.args, "--exclude-promisor-objects");
if (write_bitmaps)
argv_array_push(&cmd.args, "--write-bitmap-index");
if (use_delta_islands)
argv_array_push(&cmd.args, "--delta-islands");
if (pack_everything & ALL_INTO_ONE) {
get_non_kept_pack_filenames(&existing_packs, &keep_pack_list);

502
delta-islands.c Normal file
View file

@ -0,0 +1,502 @@
#include "cache.h"
#include "attr.h"
#include "object.h"
#include "blob.h"
#include "commit.h"
#include "tag.h"
#include "tree.h"
#include "delta.h"
#include "pack.h"
#include "tree-walk.h"
#include "diff.h"
#include "revision.h"
#include "list-objects.h"
#include "progress.h"
#include "refs.h"
#include "khash.h"
#include "pack-bitmap.h"
#include "pack-objects.h"
#include "delta-islands.h"
#include "sha1-array.h"
#include "config.h"
KHASH_INIT(str, const char *, void *, 1, kh_str_hash_func, kh_str_hash_equal)
static khash_sha1 *island_marks;
static unsigned island_counter;
static unsigned island_counter_core;
static kh_str_t *remote_islands;
struct remote_island {
uint64_t hash;
struct oid_array oids;
};
struct island_bitmap {
uint32_t refcount;
uint32_t bits[FLEX_ARRAY];
};
static uint32_t island_bitmap_size;
/*
* Allocate a new bitmap; if "old" is not NULL, the new bitmap will be a copy
* of "old". Otherwise, the new bitmap is empty.
*/
static struct island_bitmap *island_bitmap_new(const struct island_bitmap *old)
{
size_t size = sizeof(struct island_bitmap) + (island_bitmap_size * 4);
struct island_bitmap *b = xcalloc(1, size);
if (old)
memcpy(b, old, size);
b->refcount = 1;
return b;
}
static void island_bitmap_or(struct island_bitmap *a, const struct island_bitmap *b)
{
uint32_t i;
for (i = 0; i < island_bitmap_size; ++i)
a->bits[i] |= b->bits[i];
}
static int island_bitmap_is_subset(struct island_bitmap *self,
struct island_bitmap *super)
{
uint32_t i;
if (self == super)
return 1;
for (i = 0; i < island_bitmap_size; ++i) {
if ((self->bits[i] & super->bits[i]) != self->bits[i])
return 0;
}
return 1;
}
#define ISLAND_BITMAP_BLOCK(x) (x / 32)
#define ISLAND_BITMAP_MASK(x) (1 << (x % 32))
static void island_bitmap_set(struct island_bitmap *self, uint32_t i)
{
self->bits[ISLAND_BITMAP_BLOCK(i)] |= ISLAND_BITMAP_MASK(i);
}
static int island_bitmap_get(struct island_bitmap *self, uint32_t i)
{
return (self->bits[ISLAND_BITMAP_BLOCK(i)] & ISLAND_BITMAP_MASK(i)) != 0;
}
int in_same_island(const struct object_id *trg_oid, const struct object_id *src_oid)
{
khiter_t trg_pos, src_pos;
/* If we aren't using islands, assume everything goes together. */
if (!island_marks)
return 1;
/*
* If we don't have a bitmap for the target, we can delta it
* against anything -- it's not an important object
*/
trg_pos = kh_get_sha1(island_marks, trg_oid->hash);
if (trg_pos >= kh_end(island_marks))
return 1;
/*
* if the source (our delta base) doesn't have a bitmap,
* we don't want to base any deltas on it!
*/
src_pos = kh_get_sha1(island_marks, src_oid->hash);
if (src_pos >= kh_end(island_marks))
return 0;
return island_bitmap_is_subset(kh_value(island_marks, trg_pos),
kh_value(island_marks, src_pos));
}
int island_delta_cmp(const struct object_id *a, const struct object_id *b)
{
khiter_t a_pos, b_pos;
struct island_bitmap *a_bitmap = NULL, *b_bitmap = NULL;
if (!island_marks)
return 0;
a_pos = kh_get_sha1(island_marks, a->hash);
if (a_pos < kh_end(island_marks))
a_bitmap = kh_value(island_marks, a_pos);
b_pos = kh_get_sha1(island_marks, b->hash);
if (b_pos < kh_end(island_marks))
b_bitmap = kh_value(island_marks, b_pos);
if (a_bitmap) {
if (!b_bitmap || !island_bitmap_is_subset(a_bitmap, b_bitmap))
return -1;
}
if (b_bitmap) {
if (!a_bitmap || !island_bitmap_is_subset(b_bitmap, a_bitmap))
return 1;
}
return 0;
}
static struct island_bitmap *create_or_get_island_marks(struct object *obj)
{
khiter_t pos;
int hash_ret;
pos = kh_put_sha1(island_marks, obj->oid.hash, &hash_ret);
if (hash_ret)
kh_value(island_marks, pos) = island_bitmap_new(NULL);
return kh_value(island_marks, pos);
}
static void set_island_marks(struct object *obj, struct island_bitmap *marks)
{
struct island_bitmap *b;
khiter_t pos;
int hash_ret;
pos = kh_put_sha1(island_marks, obj->oid.hash, &hash_ret);
if (hash_ret) {
/*
* We don't have one yet; make a copy-on-write of the
* parent.
*/
marks->refcount++;
kh_value(island_marks, pos) = marks;
return;
}
/*
* We do have it. Make sure we split any copy-on-write before
* updating.
*/
b = kh_value(island_marks, pos);
if (b->refcount > 1) {
b->refcount--;
b = kh_value(island_marks, pos) = island_bitmap_new(b);
}
island_bitmap_or(b, marks);
}
static void mark_remote_island_1(struct remote_island *rl, int is_core_island)
{
uint32_t i;
for (i = 0; i < rl->oids.nr; ++i) {
struct island_bitmap *marks;
struct object *obj = parse_object(the_repository, &rl->oids.oid[i]);
if (!obj)
continue;
marks = create_or_get_island_marks(obj);
island_bitmap_set(marks, island_counter);
if (is_core_island && obj->type == OBJ_COMMIT)
obj->flags |= NEEDS_BITMAP;
/* If it was a tag, also make sure we hit the underlying object. */
while (obj && obj->type == OBJ_TAG) {
obj = ((struct tag *)obj)->tagged;
if (obj) {
parse_object(the_repository, &obj->oid);
marks = create_or_get_island_marks(obj);
island_bitmap_set(marks, island_counter);
}
}
}
if (is_core_island)
island_counter_core = island_counter;
island_counter++;
}
struct tree_islands_todo {
struct object_entry *entry;
unsigned int depth;
};
static int tree_depth_compare(const void *a, const void *b)
{
const struct tree_islands_todo *todo_a = a;
const struct tree_islands_todo *todo_b = b;
return todo_a->depth - todo_b->depth;
}
void resolve_tree_islands(int progress, struct packing_data *to_pack)
{
struct progress *progress_state = NULL;
struct tree_islands_todo *todo;
int nr = 0;
int i;
if (!island_marks)
return;
/*
* We process only trees, as commits and tags have already been handled
* (and passed their marks on to root trees, as well. We must make sure
* to process them in descending tree-depth order so that marks
* propagate down the tree properly, even if a sub-tree is found in
* multiple parent trees.
*/
ALLOC_ARRAY(todo, to_pack->nr_objects);
for (i = 0; i < to_pack->nr_objects; i++) {
if (oe_type(&to_pack->objects[i]) == OBJ_TREE) {
todo[nr].entry = &to_pack->objects[i];
todo[nr].depth = oe_tree_depth(to_pack, &to_pack->objects[i]);
nr++;
}
}
QSORT(todo, nr, tree_depth_compare);
if (progress)
progress_state = start_progress(_("Propagating island marks"), nr);
for (i = 0; i < nr; i++) {
struct object_entry *ent = todo[i].entry;
struct island_bitmap *root_marks;
struct tree *tree;
struct tree_desc desc;
struct name_entry entry;
khiter_t pos;
pos = kh_get_sha1(island_marks, ent->idx.oid.hash);
if (pos >= kh_end(island_marks))
continue;
root_marks = kh_value(island_marks, pos);
tree = lookup_tree(the_repository, &ent->idx.oid);
if (!tree || parse_tree(tree) < 0)
die(_("bad tree object %s"), oid_to_hex(&ent->idx.oid));
init_tree_desc(&desc, tree->buffer, tree->size);
while (tree_entry(&desc, &entry)) {
struct object *obj;
if (S_ISGITLINK(entry.mode))
continue;
obj = lookup_object(the_repository, entry.oid->hash);
if (!obj)
continue;
set_island_marks(obj, root_marks);
}
free_tree_buffer(tree);
display_progress(progress_state, i+1);
}
stop_progress(&progress_state);
free(todo);
}
static regex_t *island_regexes;
static unsigned int island_regexes_alloc, island_regexes_nr;
static const char *core_island_name;
static int island_config_callback(const char *k, const char *v, void *cb)
{
if (!strcmp(k, "pack.island")) {
struct strbuf re = STRBUF_INIT;
if (!v)
return config_error_nonbool(k);
ALLOC_GROW(island_regexes, island_regexes_nr + 1, island_regexes_alloc);
if (*v != '^')
strbuf_addch(&re, '^');
strbuf_addstr(&re, v);
if (regcomp(&island_regexes[island_regexes_nr], re.buf, REG_EXTENDED))
die(_("failed to load island regex for '%s': %s"), k, re.buf);
strbuf_release(&re);
island_regexes_nr++;
return 0;
}
if (!strcmp(k, "pack.islandcore"))
return git_config_string(&core_island_name, k, v);
return 0;
}
static void add_ref_to_island(const char *island_name, const struct object_id *oid)
{
uint64_t sha_core;
struct remote_island *rl = NULL;
int hash_ret;
khiter_t pos = kh_put_str(remote_islands, island_name, &hash_ret);
if (hash_ret) {
kh_key(remote_islands, pos) = xstrdup(island_name);
kh_value(remote_islands, pos) = xcalloc(1, sizeof(struct remote_island));
}
rl = kh_value(remote_islands, pos);
oid_array_append(&rl->oids, oid);
memcpy(&sha_core, oid->hash, sizeof(uint64_t));
rl->hash += sha_core;
}
static int find_island_for_ref(const char *refname, const struct object_id *oid,
int flags, void *data)
{
/*
* We should advertise 'ARRAY_SIZE(matches) - 2' as the max,
* so we can diagnose below a config with more capture groups
* than we support.
*/
regmatch_t matches[16];
int i, m;
struct strbuf island_name = STRBUF_INIT;
/* walk backwards to get last-one-wins ordering */
for (i = island_regexes_nr - 1; i >= 0; i--) {
if (!regexec(&island_regexes[i], refname,
ARRAY_SIZE(matches), matches, 0))
break;
}
if (i < 0)
return 0;
if (matches[ARRAY_SIZE(matches) - 1].rm_so != -1)
warning(_("island regex from config has "
"too many capture groups (max=%d)"),
(int)ARRAY_SIZE(matches) - 2);
for (m = 1; m < ARRAY_SIZE(matches); m++) {
regmatch_t *match = &matches[m];
if (match->rm_so == -1)
continue;
if (island_name.len)
strbuf_addch(&island_name, '-');
strbuf_add(&island_name, refname + match->rm_so, match->rm_eo - match->rm_so);
}
add_ref_to_island(island_name.buf, oid);
strbuf_release(&island_name);
return 0;
}
static struct remote_island *get_core_island(void)
{
if (core_island_name) {
khiter_t pos = kh_get_str(remote_islands, core_island_name);
if (pos < kh_end(remote_islands))
return kh_value(remote_islands, pos);
}
return NULL;
}
static void deduplicate_islands(void)
{
struct remote_island *island, *core = NULL, **list;
unsigned int island_count, dst, src, ref, i = 0;
island_count = kh_size(remote_islands);
ALLOC_ARRAY(list, island_count);
kh_foreach_value(remote_islands, island, {
list[i++] = island;
});
for (ref = 0; ref + 1 < island_count; ref++) {
for (src = ref + 1, dst = src; src < island_count; src++) {
if (list[ref]->hash == list[src]->hash)
continue;
if (src != dst)
list[dst] = list[src];
dst++;
}
island_count = dst;
}
island_bitmap_size = (island_count / 32) + 1;
core = get_core_island();
for (i = 0; i < island_count; ++i) {
mark_remote_island_1(list[i], core && list[i]->hash == core->hash);
}
free(list);
}
void load_delta_islands(void)
{
island_marks = kh_init_sha1();
remote_islands = kh_init_str();
git_config(island_config_callback, NULL);
for_each_ref(find_island_for_ref, NULL);
deduplicate_islands();
fprintf(stderr, _("Marked %d islands, done.\n"), island_counter);
}
void propagate_island_marks(struct commit *commit)
{
khiter_t pos = kh_get_sha1(island_marks, commit->object.oid.hash);
if (pos < kh_end(island_marks)) {
struct commit_list *p;
struct island_bitmap *root_marks = kh_value(island_marks, pos);
parse_commit(commit);
set_island_marks(&get_commit_tree(commit)->object, root_marks);
for (p = commit->parents; p; p = p->next)
set_island_marks(&p->item->object, root_marks);
}
}
int compute_pack_layers(struct packing_data *to_pack)
{
uint32_t i;
if (!core_island_name || !island_marks)
return 1;
for (i = 0; i < to_pack->nr_objects; ++i) {
struct object_entry *entry = &to_pack->objects[i];
khiter_t pos = kh_get_sha1(island_marks, entry->idx.oid.hash);
oe_set_layer(to_pack, entry, 1);
if (pos < kh_end(island_marks)) {
struct island_bitmap *bitmap = kh_value(island_marks, pos);
if (island_bitmap_get(bitmap, island_counter_core))
oe_set_layer(to_pack, entry, 0);
}
}
return 2;
}

11
delta-islands.h Normal file
View file

@ -0,0 +1,11 @@
#ifndef DELTA_ISLANDS_H
#define DELTA_ISLANDS_H
int island_delta_cmp(const struct object_id *a, const struct object_id *b);
int in_same_island(const struct object_id *, const struct object_id *);
void resolve_tree_islands(int progress, struct packing_data *to_pack);
void load_delta_islands(void);
void propagate_island_marks(struct commit *commit);
int compute_pack_layers(struct packing_data *to_pack);
#endif /* DELTA_ISLANDS_H */

View file

@ -164,6 +164,12 @@ struct object_entry *packlist_alloc(struct packing_data *pdata,
REALLOC_ARRAY(pdata->in_pack, pdata->nr_alloc);
if (pdata->delta_size)
REALLOC_ARRAY(pdata->delta_size, pdata->nr_alloc);
if (pdata->tree_depth)
REALLOC_ARRAY(pdata->tree_depth, pdata->nr_alloc);
if (pdata->layer)
REALLOC_ARRAY(pdata->layer, pdata->nr_alloc);
}
new_entry = pdata->objects + pdata->nr_objects++;
@ -179,6 +185,12 @@ struct object_entry *packlist_alloc(struct packing_data *pdata,
if (pdata->in_pack)
pdata->in_pack[pdata->nr_objects - 1] = NULL;
if (pdata->tree_depth)
pdata->tree_depth[pdata->nr_objects - 1] = 0;
if (pdata->layer)
pdata->layer[pdata->nr_objects - 1] = 0;
return new_entry;
}

View file

@ -103,6 +103,7 @@ struct object_entry {
unsigned no_try_delta:1;
unsigned type_:TYPE_BITS;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
unsigned preferred_base:1; /*
* we do not pack this, but is available
* to be used as the base object to delta
@ -158,6 +159,10 @@ struct packing_data {
uintmax_t oe_size_limit;
uintmax_t oe_delta_size_limit;
/* delta islands */
unsigned int *tree_depth;
unsigned char *layer;
};
void prepare_packing_data(struct packing_data *pdata);
@ -400,4 +405,38 @@ static inline void oe_set_delta_size(struct packing_data *pack,
}
}
static inline unsigned int oe_tree_depth(struct packing_data *pack,
struct object_entry *e)
{
if (!pack->tree_depth)
return 0;
return pack->tree_depth[e - pack->objects];
}
static inline void oe_set_tree_depth(struct packing_data *pack,
struct object_entry *e,
unsigned int tree_depth)
{
if (!pack->tree_depth)
ALLOC_ARRAY(pack->tree_depth, pack->nr_objects);
pack->tree_depth[e - pack->objects] = tree_depth;
}
static inline unsigned char oe_layer(struct packing_data *pack,
struct object_entry *e)
{
if (!pack->layer)
return 0;
return pack->layer[e - pack->objects];
}
static inline void oe_set_layer(struct packing_data *pack,
struct object_entry *e,
unsigned char layer)
{
if (!pack->layer)
ALLOC_ARRAY(pack->layer, pack->nr_objects);
pack->layer[e - pack->objects] = layer;
}
#endif

143
t/t5320-delta-islands.sh Executable file
View file

@ -0,0 +1,143 @@
#!/bin/sh
test_description='exercise delta islands'
. ./test-lib.sh
# returns true iff $1 is a delta based on $2
is_delta_base () {
delta_base=$(echo "$1" | git cat-file --batch-check='%(deltabase)') &&
echo >&2 "$1 has base $delta_base" &&
test "$delta_base" = "$2"
}
# generate a commit on branch $1 with a single file, "file", whose
# content is mostly based on the seed $2, but with a unique bit
# of content $3 appended. This should allow us to see whether
# blobs of different refs delta against each other.
commit() {
blob=$({ test-tool genrandom "$2" 10240 && echo "$3"; } |
git hash-object -w --stdin) &&
tree=$(printf '100644 blob %s\tfile\n' "$blob" | git mktree) &&
commit=$(echo "$2-$3" | git commit-tree "$tree" ${4:+-p "$4"}) &&
git update-ref "refs/heads/$1" "$commit" &&
eval "$1"'=$(git rev-parse $1:file)' &&
eval "echo >&2 $1=\$$1"
}
test_expect_success 'setup commits' '
commit one seed 1 &&
commit two seed 12
'
# Note: This is heavily dependent on the "prefer larger objects as base"
# heuristic.
test_expect_success 'vanilla repack deltas one against two' '
git repack -adf &&
is_delta_base $one $two
'
test_expect_success 'island repack with no island definition is vanilla' '
git repack -adfi &&
is_delta_base $one $two
'
test_expect_success 'island repack with no matches is vanilla' '
git -c "pack.island=refs/foo" repack -adfi &&
is_delta_base $one $two
'
test_expect_success 'separate islands disallows delta' '
git -c "pack.island=refs/heads/(.*)" repack -adfi &&
! is_delta_base $one $two &&
! is_delta_base $two $one
'
test_expect_success 'same island allows delta' '
git -c "pack.island=refs/heads" repack -adfi &&
is_delta_base $one $two
'
test_expect_success 'coalesce same-named islands' '
git \
-c "pack.island=refs/(.*)/one" \
-c "pack.island=refs/(.*)/two" \
repack -adfi &&
is_delta_base $one $two
'
test_expect_success 'island restrictions drop reused deltas' '
git repack -adfi &&
is_delta_base $one $two &&
git -c "pack.island=refs/heads/(.*)" repack -adi &&
! is_delta_base $one $two &&
! is_delta_base $two $one
'
test_expect_success 'island regexes are left-anchored' '
git -c "pack.island=heads/(.*)" repack -adfi &&
is_delta_base $one $two
'
test_expect_success 'island regexes follow last-one-wins scheme' '
git \
-c "pack.island=refs/heads/(.*)" \
-c "pack.island=refs/heads/" \
repack -adfi &&
is_delta_base $one $two
'
test_expect_success 'setup shared history' '
commit root shared root &&
commit one shared 1 root &&
commit two shared 12-long root
'
# We know that $two will be preferred as a base from $one,
# because we can transform it with a pure deletion.
#
# We also expect $root as a delta against $two by the "longest is base" rule.
test_expect_success 'vanilla delta goes between branches' '
git repack -adf &&
is_delta_base $one $two &&
is_delta_base $root $two
'
# Here we should allow $one to base itself on $root; even though
# they are in different islands, the objects in $root are in a superset
# of islands compared to those in $one.
#
# Similarly, $two can delta against $root by our rules. And unlike $one,
# in which we are just allowing it, the island rules actually put $root
# as a possible base for $two, which it would not otherwise be (due to the size
# sorting).
test_expect_success 'deltas allowed against superset islands' '
git -c "pack.island=refs/heads/(.*)" repack -adfi &&
is_delta_base $one $root &&
is_delta_base $two $root
'
# We are going to test the packfile order here, so we again have to make some
# assumptions. We assume that "$root", as part of our core "one", must come
# before "$two". This should be guaranteed by the island code. However, for
# this test to fail without islands, we are also assuming that it would not
# otherwise do so. This is true by the current write order, which will put
# commits (and their contents) before their parents.
test_expect_success 'island core places core objects first' '
cat >expect <<-EOF &&
$root
$two
EOF
git -c "pack.island=refs/heads/(.*)" \
-c "pack.islandcore=one" \
repack -adfi &&
git verify-pack -v .git/objects/pack/*.pack |
cut -d" " -f1 |
egrep "$root|$two" >actual &&
test_cmp expect actual
'
test_expect_success 'unmatched island core is not fatal' '
git -c "pack.islandcore=one" repack -adfi
'
test_done