Commit graph

211 commits

Author SHA1 Message Date
Taylor Blau
4e58cedd94 builtin/repack.c: move .idx files into place last
In a similar spirit as the previous patch, fix the identical problem
from `git repack` (which invokes `pack-objects` with a temporary
location for output, and then moves the files into their final locations
itself).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 18:23:11 -07:00
Taylor Blau
ff1e653c8e midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
Introduce a new 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' environment
variable to also write a multi-pack bitmap when
'GIT_TEST_MULTI_PACK_INDEX' is set.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Rafael Silva
a643157d5a repack: avoid loosening promisor objects in partial clones
When `git repack -A -d` is run in a partial clone, `pack-objects`
is invoked twice: once to repack all promisor objects, and once to
repack all non-promisor objects. The latter `pack-objects` invocation
is with --exclude-promisor-objects and --unpack-unreachable, which
loosens all objects unused during this invocation. Unfortunately,
this includes promisor objects.

Because the -d argument to `git repack` subsequently deletes all loose
objects also in packs, these just-loosened promisor objects will be
immediately deleted. However, this extra disk churn is unnecessary in
the first place.  For example, in a newly-cloned partial repo that
filters all blob objects (e.g. `--filter=blob:none`), `repack` ends up
unpacking all trees and commits into the filesystem because every
object, in this particular case, is a promisor object. Depending on
the repo size, this increases the disk usage considerably: In my copy
of the linux.git, the object directory peaked 26GB of more disk usage.

In order to avoid this extra disk churn, pass the names of the promisor
packfiles as --keep-pack arguments to the second invocation of
`pack-objects`. This informs `pack-objects` that the promisor objects
are already in a safe packfile and, therefore, do not need to be
loosened.

For testing, we need to validate whether any object was loosened.
However, the "evidence" (loosened objects) is deleted during the
process which prevents us from inspecting the object directory.
Instead, let's teach `pack-objects` to count loosened objects and
emit via trace2 thus allowing inspecting the debug events after the
process is finished. This new event is used on the added regression
test.

Lastly, add a new perf test to evaluate the performance impact
made by this changes (tested on git.git):

     Test          HEAD^                 HEAD
     ----------------------------------------------------------
     5600.3: gc    134.38(41.93+90.95)   7.80(6.72+1.35) -94.2%

For a bigger repository, such as linux.git, the improvement is
even bigger:

     Test          HEAD^                     HEAD
     -------------------------------------------------------------------
     5600.3: gc    6833.00(918.07+3162.74)   268.79(227.02+39.18) -96.1%

These improvements are particular big because every object in the
newly-cloned partial repository is a promisor object.

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-28 13:36:13 +09:00
Junio C Hamano
e6b971fcf5 Merge branch 'tb/reverse-midx'
An on-disk reverse-index to map the in-pack location of an object
back to its object name across multiple packfiles is introduced.

* tb/reverse-midx:
  midx.c: improve cache locality in midx_pack_order_cmp()
  pack-revindex: write multi-pack reverse indexes
  pack-write.c: extract 'write_rev_file_order'
  pack-revindex: read multi-pack reverse indexes
  Documentation/technical: describe multi-pack reverse indexes
  midx: make some functions non-static
  midx: keep track of the checksum
  midx: don't free midx_name early
  midx: allow marking a pack as preferred
  t/helper/test-read-midx.c: add '--show-objects'
  builtin/multi-pack-index.c: display usage on unrecognized command
  builtin/multi-pack-index.c: don't enter bogus cmd_mode
  builtin/multi-pack-index.c: split sub-commands
  builtin/multi-pack-index.c: define common usage with a macro
  builtin/multi-pack-index.c: don't handle 'progress' separately
  builtin/multi-pack-index.c: inline 'flags' with options
2021-04-08 13:23:25 -07:00
Taylor Blau
9218c6a40c midx: allow marking a pack as preferred
When multiple packs in the multi-pack index contain the same object, the
MIDX machinery must make a choice about which pack it associates with
that object. Prior to this patch, the lowest-ordered[1] pack was always
selected.

Pack selection for duplicate objects is relatively unimportant today,
but it will become important for multi-pack bitmaps. This is because we
can only invoke the pack-reuse mechanism when all of the bits for reused
objects come from the reuse pack (in order to ensure that all reused
deltas can find their base objects in the same pack).

To encourage the pack selection process to prefer one pack over another
(the pack to be preferred is the one a caller would like to later use as
a reuse pack), introduce the concept of a "preferred pack". When
provided, the MIDX code will always prefer an object found in a
preferred pack over any other.

No format changes are required to store the preferred pack, since it
will be able to be inferred with a corresponding MIDX bitmap, by looking
up the pack associated with the object in the first bit position (this
ordering is described in detail in a subsequent commit).

[1]: the ordering is specified by MIDX internals; for our purposes we
can consider the "lowest ordered" pack to be "the one with the
most-recent mtime.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-01 13:07:37 -07:00
Junio C Hamano
ccae01cab8 builtin/repack.c: reword comment around pack-objects flags
The comment in this block is meant to indicate that passing '--all',
'--reflog', and so on aren't necessary when repacking with the
'--geometric' option.

But, it has two problems: first, it is factually incorrect ('--all' is
*not* incompatible with '--stdin-packs' as the comment suggests);
second, it is quite focused on the geometric case for a block that is
guarding against it.

Reword this comment to address both issues.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-05 11:33:52 -08:00
Taylor Blau
2a15964128 builtin/repack.c: be more conservative with unsigned overflows
There are a number of places in the geometric repack code where we
multiply the number of objects in a pack by another unsigned value. We
trust that the number of objects in a pack is always representable by a
uint32_t, but we don't necessarily trust that that number can be
multiplied without overflow.

Sprinkle some unsigned_add_overflows() and unsigned_mult_overflows() in
split_pack_geometry() to check that we never overflow any unsigned types
when adding or multiplying them.

Arguably these checks are a little too conservative, and certainly they
do not help the readability of this function. But they are serving a
useful purpose, so I think they are worthwhile overall.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-05 11:33:52 -08:00
Taylor Blau
13d746a303 builtin/repack.c: assign pack split later
To determine the where to place the split when repacking with the
'--geometric' option, split_pack_geometry() assigns the "split" variable
and then decrements it in a loop.

It would be equivalent (and more readable) to assign the split to the
loop position after exiting the loop, so do that instead.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-05 11:33:52 -08:00
Taylor Blau
f25e33c156 builtin/repack.c: do not repack single packs with --geometric
In 0fabafd0b9 (builtin/repack.c: add '--geometric' option, 2021-02-22),
the 'git repack --geometric' code aborts early when there is zero or one
pack.

When there are no packs, this code does the right thing by placing the
split at "0". But when there is exactly one pack, the split is placed at
"1", which means that "git repack --geometric" (with any factor)
repacks all of the objects in a single pack.

This is wasteful, and the remaining code in split_pack_geometry() does
the right thing (not repacking the objects in a single pack) even when
only one pack is present.

Loosen the guard to only stop when there aren't any packs, and let the
rest of the code do the right thing. Add a test to ensure that this is
the case.

Noticed-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-05 11:33:52 -08:00
Taylor Blau
0fabafd0b9 builtin/repack.c: add '--geometric' option
Often it is useful to both:

  - have relatively few packfiles in a repository, and

  - avoid having so few packfiles in a repository that we repack its
    entire contents regularly

This patch implements a '--geometric=<n>' option in 'git repack'. This
allows the caller to specify that they would like each pack to be at
least a factor times as large as the previous largest pack (by object
count).

Concretely, say that a repository has 'n' packfiles, labeled P1, P2,
..., up to Pn. Each packfile has an object count equal to 'objects(Pn)'.
With a geometric factor of 'r', it should be that:

  objects(Pi) > r*objects(P(i-1))

for all i in [1, n], where the packs are sorted by

  objects(P1) <= objects(P2) <= ... <= objects(Pn).

Since finding a true optimal repacking is NP-hard, we approximate it
along two directions:

  1. We assume that there is a cutoff of packs _before starting the
     repack_ where everything to the right of that cut-off already forms
     a geometric progression (or no cutoff exists and everything must be
     repacked).

  2. We assume that everything smaller than the cutoff count must be
     repacked. This forms our base assumption, but it can also cause
     even the "heavy" packs to get repacked, for e.g., if we have 6
     packs containing the following number of objects:

       1, 1, 1, 2, 4, 32

     then we would place the cutoff between '1, 1' and '1, 2, 4, 32',
     rolling up the first two packs into a pack with 2 objects. That
     breaks our progression and leaves us:

       2, 1, 2, 4, 32
         ^

     (where the '^' indicates the position of our split). To restore a
     progression, we move the split forward (towards larger packs)
     joining each pack into our new pack until a geometric progression
     is restored. Here, that looks like:

       2, 1, 2, 4, 32  ~>  3, 2, 4, 32  ~>  5, 4, 32  ~> ... ~> 9, 32
         ^                   ^                ^                   ^

This has the advantage of not repacking the heavy-side of packs too
often while also only creating one new pack at a time. Another wrinkle
is that we assume that loose, indexed, and reflog'd objects are
insignificant, and lump them into any new pack that we create. This can
lead to non-idempotent results.

Suggested-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22 23:30:52 -08:00
Taylor Blau
2f4ba2a867 packfile: prepare for the existence of '*.rev' files
Specify the format of the on-disk reverse index 'pack-*.rev' file, as
well as prepare the code for the existence of such files.

The reverse index maps from pack relative positions (i.e., an index into
the array of object which is sorted by their offsets within the
packfile) to their position within the 'pack-*.idx' file. Today, this is
done by building up a list of (off_t, uint32_t) tuples for each object
(the off_t corresponding to that object's offset, and the uint32_t
corresponding to its position in the index). To convert between pack and
index position quickly, this array of tuples is radix sorted based on
its offset.

This has two major drawbacks:

First, the in-memory cost scales linearly with the number of objects in
a pack.  Each 'struct revindex_entry' is sizeof(off_t) +
sizeof(uint32_t) + padding bytes for a total of 16.

To observe this, force Git to load the reverse index by, for e.g.,
running 'git cat-file --batch-check="%(objectsize:disk)"'. When asking
for a single object in a fresh clone of the kernel, Git needs to
allocate 120+ MB of memory in order to hold the reverse index in memory.

Second, the cost to sort also scales with the size of the pack.
Luckily, this is a linear function since 'load_pack_revindex()' uses a
radix sort, but this cost still must be paid once per pack per process.

As an example, it takes ~60x longer to print the _size_ of an object as
it does to print that entire object's _contents_:

  Benchmark #1: git.compile cat-file --batch <obj
    Time (mean ± σ):       3.4 ms ±   0.1 ms    [User: 3.3 ms, System: 2.1 ms]
    Range (min … max):     3.2 ms …   3.7 ms    726 runs

  Benchmark #2: git.compile cat-file --batch-check="%(objectsize:disk)" <obj
    Time (mean ± σ):     210.3 ms ±   8.9 ms    [User: 188.2 ms, System: 23.2 ms]
    Range (min … max):   193.7 ms … 224.4 ms    13 runs

Instead, avoid computing and sorting the revindex once per process by
writing it to a file when the pack itself is generated.

The format is relatively straightforward. It contains an array of
uint32_t's, the length of which is equal to the number of objects in the
pack.  The ith entry in this table contains the index position of the
ith object in the pack, where "ith object in the pack" is determined by
pack offset.

One thing that the on-disk format does _not_ contain is the full (up to)
eight-byte offset corresponding to each object. This is something that
the in-memory revindex contains (it stores an off_t in 'struct
revindex_entry' along with the same uint32_t that the on-disk format
has). Omit it in the on-disk format, since knowing the index position
for some object is sufficient to get a constant-time lookup in the
pack-*.idx file to ask for an object's offset within the pack.

This trades off between the on-disk size of the 'pack-*.rev' file for
runtime to chase down the offset for some object. Even though the lookup
is constant time, the constant is heavier, since it can potentially
involve two pointer walks in v2 indexes (one to access the 4-byte offset
table, and potentially a second to access the double wide offset table).

Consider trying to map an object's pack offset to a relative position
within that pack. In a cold-cache scenario, more page faults occur while
switching between binary searching through the reverse index and
searching through the *.idx file for an object's offset. Sure enough,
with a cold cache (writing '3' into '/proc/sys/vm/drop_caches' after
'sync'ing), printing out the entire object's contents is still
marginally faster than printing its size:

  Benchmark #1: git.compile cat-file --batch-check="%(objectsize:disk)" <obj >/dev/null
    Time (mean ± σ):      22.6 ms ±   0.5 ms    [User: 2.4 ms, System: 7.9 ms]
    Range (min … max):    21.4 ms …  23.5 ms    41 runs

  Benchmark #2: git.compile cat-file --batch <obj >/dev/null
    Time (mean ± σ):      17.2 ms ±   0.7 ms    [User: 2.8 ms, System: 5.5 ms]
    Range (min … max):    15.6 ms …  18.2 ms    45 runs

(Numbers taken in the kernel after cheating and using the next patch to
generate a reverse index). There are a couple of approaches to improve
cold cache performance not pursued here:

  - We could include the object offsets in the reverse index format.
    Predictably, this does result in fewer page faults, but it triples
    the size of the file, while simultaneously duplicating a ton of data
    already available in the .idx file. (This was the original way I
    implemented the format, and it did show
    `--batch-check='%(objectsize:disk)'` winning out against `--batch`.)

    On the other hand, this increase in size also results in a large
    block-cache footprint, which could potentially hurt other workloads.

  - We could store the mapping from pack to index position in more
    cache-friendly way, like constructing a binary search tree from the
    table and writing the values in breadth-first order. This would
    result in much better locality, but the price you pay is trading
    O(1) lookup in 'pack_pos_to_index()' for an O(log n) one (since you
    can no longer directly index the table).

So, neither of these approaches are taken here. (Thankfully, the format
is versioned, so we are free to pursue these in the future.) But, cold
cache performance likely isn't interesting outside of one-off cases like
asking for the size of an object directly. In real-world usage, Git is
often performing many operations in the revindex (i.e., asking about
many objects rather than a single one).

The trade-off is worth it, since we will avoid the vast majority of the
cost of generating the revindex that the extra pointer chase will look
like noise in the following patch's benchmarks.

This patch describes the format and prepares callers (like in
pack-revindex.c) to be able to read *.rev files once they exist. An
implementation of the writer will appear in the next patch, and callers
will gradually begin to start using the writer in the patches that
follow after that.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-25 18:32:43 -08:00
Christian Couder
33add2ad7d fetch-pack: refactor writing promisor file
Let's replace the 2 different pieces of code that write a
promisor file in 'builtin/repack.c' and 'fetch-pack.c'
with a new function called 'write_promisor_file()' in
'pack-write.c' and 'pack.h'.

This might also help us in the future, if we want to put
back the ref names and associated hashes that were in
the promisor files we are repacking in 'builtin/repack.c'
as suggested by a NEEDSWORK comment just above the code
we are refactoring.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-12 16:01:07 -08:00
Taylor Blau
2fcb03b52d builtin/repack.c: don't move existing packs out of the way
When 'git repack' creates a pack with the same name as any existing
pack, it moves the existing one to 'old-pack-xxx.{pack,idx,...}' and
then renames the new one into place.

Eventually, it would be nice to have 'git repack' allow for writing a
multi-pack index at the critical time (after the new packs have been
written / moved into place, but before the old ones have been deleted).
Guessing that this option might be called '--write-midx', this makes the
following situation (where repacks are issued back-to-back without any
new objects) impossible:

    $ git repack -adb
    $ git repack -adb --write-midx

In the second repack, the existing packs are overwritten verbatim with
the same rename-to-old sequence. At that point, the current MIDX is
invalidated, since it refers to now-missing packs. So that code wants to
be run after the MIDX is re-written. But (prior to this patch) the new
MIDX can't be written until the new packs are moved into place. So, we
have a circular dependency.

This is all hypothetical, since no code currently exists to write a MIDX
safely during a 'git repack' (the 'GIT_TEST_MULTI_PACK_INDEX' does so
unsafely). Putting hypothetical aside, though: why do we need to rename
existing packs to be prefixed with 'old-' anyway?

This behavior dates all the way back to 2ad47d6 (git-repack: Be
careful when updating the same pack as an existing one., 2006-06-25).
2ad47d6 is mainly concerned about a case where a newly written pack
would have a different structure than its index. This used to be
possible when the pack name was a hash of the set of objects. Under this
naming scheme, two packs that store the same set of objects could differ
in delta selection, object positioning, or both. If this happened, then
any such packs would be unreadable in the instant between copying the
new pack and new index (i.e., either the index or pack will be stale
depending on the order that they were copied).

But since 1190a1a (pack-objects: name pack files after trailer hash,
2013-12-05), this is no longer possible, since pack files are named not
after their logical contents (i.e., the set of objects), but by the
actual checksum of their contents. So, this old- behavior can safely go,
which allows us to avoid our circular dependency above.

In addition to avoiding the circular dependency, this patch also makes
'git repack' a lot simpler, since we don't have to deal with failures
encountered when renaming existing packs to be prefixed with 'old-'.

This patch is mostly limited to removing code paths that deal with the
'old' prefixing, with the exception of files that include the pack's
name in their own filename, like .idx, .bitmap, and related files. The
exception is that we want to continue to trust what pack-objects wrote.
That is, it is not the case that we pretend as if pack-objects didn't
write files identical to ones that already exist, but rather that we
respect what pack-objects wrote as the source of truth. That cuts two
ways:

  - If pack-objects produced an identical pack to one that already
    exists with a bitmap, but did not produce a bitmap, we remove the
    bitmap that already exists. (This behavior is codified in t7700.14).

  - If pack-objects produced an identical pack to one that already
    exists, we trust the just-written version of the coresponding .idx,
    .promisor, and other files over the ones that already exist. This
    ensures that we use the most up-to-date versions of this files,
    which is safe even in the face of format changes in, say, the .idx
    file (which would not be reflected in the .idx file's name).

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-17 13:31:55 -08:00
Taylor Blau
704c4a5c07 builtin/repack.c: keep track of what pack-objects wrote
In the subsequent commit, it will become useful to keep track of which
metadata files were written by pack-objects. We already do this to an
extent with the 'exts' array, which only is used in the context of
existing packs.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-16 15:57:44 -08:00
Jeff King
63f4d5cf57 repack: make "exts" array available outside cmd_repack()
We'll use it in a helper function soon.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-16 15:57:43 -08:00
Junio C Hamano
a31677dde3 Merge branch 'tb/repack-clearing-midx'
When a packfile is removed by "git repack", multi-pack-index gets
cleared; the code was taught to do so less aggressively by first
checking if the midx actually refers to a pack that no longer
exists.

* tb/repack-clearing-midx:
  midx: traverse the local MIDX first
  builtin/repack.c: invalidate MIDX only when necessary
2020-09-09 13:53:06 -07:00
Taylor Blau
59552fb3e2 midx: traverse the local MIDX first
When a repository has an alternate object directory configured, callers
can traverse through each alternate's MIDX by walking the '->next'
pointer.

But, when 'prepare_multi_pack_index_one()' loads multiple MIDXs, it
places the new ones at the front of this pointer chain, not at the end.
This can be confusing for callers such as 'git repack -ad', causing test
failures like in t7700.6 with 'GIT_TEST_MULTI_PACK_INDEX=1'.

The occurs when dropping a pack known to the local MIDX with alternates
configured that have their own MIDX. Since the alternate's MIDX is
returned via 'get_multi_pack_index()', 'midx_contains_pack()' returns
true (which is correct, since it traverses through the '->next' pointer
to find the MIDX in the chain that does contain the requested object).
But, we call 'clear_midx_file()' on 'the_repository', which drops the
MIDX at the path of the first MIDX in the chain, which (in the case of
t7700.6 is the one in the alternate).

This patch addresses that by:

  - placing the local MIDX first in the chain when calling
    'prepare_multi_pack_index_one()', and

  - introducing a new 'get_local_multi_pack_index()', which explicitly
    returns the repository-local MIDX, if any.

Don't impose an additional order on the MIDX's '->next' pointer beyond
that the first item in the chain must be local if one exists so that we
avoid a quadratic insertion.

Likewise, use 'get_local_multi_pack_index()' in
'remove_redundant_pack()' to fix the formerly broken t7700.6 when run
with 'GIT_TEST_MULTI_PACK_INDEX=1'.

Finally, note that the MIDX ordering invariant is only preserved by the
insertion order in 'prepare_packed_git()', which traverses through the
ODB's '->next' pointer, meaning we visit the local object store first.
This fragility makes this an undesirable long-term solution if more
callers are added, but it is acceptable for now since this is the only
caller.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-28 14:07:09 -07:00
Taylor Blau
e08f7bb093 builtin/repack.c: invalidate MIDX only when necessary
In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
learned to remove a multi-pack-index file if it added or removed a pack
from the object store.

This mechanism is a little over-eager, since it is only necessary to
drop a MIDX if 'git repack' removes a pack that the MIDX references.
Adding a pack outside of the MIDX does not require invalidating the
MIDX, and likewise for removing a pack the MIDX does not know about.

Teach 'git repack' to check for this by loading the MIDX, and checking
whether the to-be-removed pack is known to the MIDX. This requires a
slightly odd alternation to a test in t5319, which is explained with a
comment. A new test is added to show that the MIDX is left alone when
both packs known to it are marked as .keep, but two packs unknown to it
are removed and combined into one new pack.

Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-26 13:55:46 -07:00
Jeff King
f6d8942b1f strvec: fix indentation in renamed calls
Code which split an argv_array call across multiple lines, like:

  argv_array_pushl(&args, "one argument",
                   "another argument", "and more",
		   NULL);

was recently mechanically renamed to use strvec, which results in
mis-matched indentation like:

  strvec_pushl(&args, "one argument",
                   "another argument", "and more",
		   NULL);

Let's fix these up to align the arguments with the opening paren. I did
this manually by sifting through the results of:

  git jump grep 'strvec_.*,$'

and liberally applying my editor's auto-format. Most of the changes are
of the form shown above, though I also normalized a few that had
originally used a single-tab indentation (rather than our usual style of
aligning with the open paren). I also rewrapped a couple of obvious
cases (e.g., where previously too-long lines became short enough to fit
on one), but I wasn't aggressive about it. In cases broken to three or
more lines, the grouping of arguments is sometimes meaningful, and it
wasn't worth my time or reviewer time to ponder each case individually.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28 15:02:18 -07:00
Jeff King
22f9b7f3f5 strvec: convert builtin/ callers away from argv_array name
We eventually want to drop the argv_array name and just use strvec
consistently. There's no particular reason we have to do it all at once,
or care about interactions between converted and unconverted bits.
Because of our preprocessor compat layer, the names are interchangeable
to the compiler (so even a definition and declaration using different
names is OK).

This patch converts all of the files in builtin/ to keep the diff to a
manageable size.

The conversion was done purely mechanically with:

  git ls-files '*.c' '*.h' |
  xargs perl -i -pe '
    s/ARGV_ARRAY/STRVEC/g;
    s/argv_array/strvec/g;
  '

and then selectively staging files with "git add builtin/". We'll deal
with any indentation/style fallouts separately.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28 15:02:18 -07:00
Jeff King
dbbcd44fb4 strvec: rename files from argv-array to strvec
This requires updating #include lines across the code-base, but that's
all fairly mechanical, and was done with:

  git ls-files '*.c' '*.h' |
  xargs perl -i -pe 's/argv-array.h/strvec.h/'

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28 15:02:17 -07:00
Junio C Hamano
896833b268 Merge branch 'tb/shallow-cleanup'
Code cleanup.

* tb/shallow-cleanup:
  shallow: use struct 'shallow_lock' for additional safety
  shallow.h: document '{commit,rollback}_shallow_file'
  shallow: extract a header file for shallow-related functions
  commit: make 'commit_graft_pos' non-static
2020-05-13 12:19:18 -07:00
Taylor Blau
120ad2b0f1 shallow: extract a header file for shallow-related functions
There are many functions in commit.h that are more related to shallow
repositories than they are to any sort of generic commit machinery.
Likely this began when there were only a few shallow-related functions,
and commit.h seemed a reasonable enough place to put them.

But, now there are a good number of shallow-related functions, and
placing them all in 'commit.h' doesn't make sense.

This patch extracts a 'shallow.h', which takes all of the declarations
from 'commit.h' for functions which already exist in 'shallow.c'. We
will bring the remaining shallow-related functions defined in 'commit.c'
in a subsequent patch.

For now, move only the ones that already are implemented in 'shallow.c',
and update the necessary includes.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-30 14:19:13 -07:00
Denton Liu
9460fd48b5 Lib-ify prune-packed
In builtin.h, there exists the distinctly lib-ish function
prune_packed_objects(). This function can currently only be called by
built-in commands but, unlike all of the other functions in the header,
it does not make sense to impose this restriction as the functionality
can be logically reused in libgit.

Extract this function into prune-packed.c so that related definitions
can exist clearly in their own header file.

While we're at it, clean up #includes that are unused.

This patch is best viewed with --color-moved.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-24 15:04:44 -07:00
Junio C Hamano
8f1119b988 Merge branch 'wb/midx-progress'
The code to generate multi-pack index learned to show (or not to
show) progress indicators.

* wb/midx-progress:
  multi-pack-index: add [--[no-]progress] option.
  midx: honor the MIDX_PROGRESS flag in midx_repack
  midx: honor the MIDX_PROGRESS flag in verify_midx_file
  midx: add progress to expire_midx_packs
  midx: add progress to write_midx_file
  midx: add MIDX_PROGRESS flag
2019-11-10 18:02:14 +09:00
William Baker
efbc3aee08 midx: add MIDX_PROGRESS flag
Add the MIDX_PROGRESS flag and update the
write|verify|expire|repack functions in midx.h
to accept a flags parameter.  The MIDX_PROGRESS
flag indicates whether the caller of the function
would like progress information to be displayed.
This patch only changes the method prototypes
and does not change the functionality. The
functionality change will be handled by a later patch.

Signed-off-by: William Baker <William.Baker@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-23 12:05:05 +09:00
Jonathan Tan
5374a290aa fetch-pack: write fetched refs to .promisor
The specification of promisor packfiles (in partial-clone.txt) states
that the .promisor files that accompany packfiles do not matter (just
like .keep files), so whenever a packfile is fetched from the promisor
remote, Git has been writing empty .promisor files. But these files
could contain more useful information.

So instead of writing empty files, write the refs fetched to these
files. This makes it easier to debug issues with partial clones, as we
can identify what refs (and their associated hashes) were fetched at the
time the packfile was downloaded, and if necessary, compare those hashes
against what the promisor remote reports now.

This is implemented by teaching fetch-pack to write its own non-empty
.promisor file whenever it knows the name of the pack's lockfile. This
covers the case wherein the user runs "git fetch" with an internal
protocol or HTTP protocol v2 (fetch_refs_via_pack() in transport.c sets
lock_pack) and with HTTP protocol v0/v1 (fetch_git() in remote-curl.c
passes "--lock-pack" to "fetch-pack").

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Acked-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-16 11:07:51 +09:00
Junio C Hamano
676278f8ea Merge branch 'bc/object-id-part17'
Preparation for SHA-256 upgrade continues.

* bc/object-id-part17: (26 commits)
  midx: switch to using the_hash_algo
  builtin/show-index: replace sha1_to_hex
  rerere: replace sha1_to_hex
  builtin/receive-pack: replace sha1_to_hex
  builtin/index-pack: replace sha1_to_hex
  packfile: replace sha1_to_hex
  wt-status: convert struct wt_status to object_id
  cache: remove null_sha1
  builtin/worktree: switch null_sha1 to null_oid
  builtin/repack: write object IDs of the proper length
  pack-write: use hash_to_hex when writing checksums
  sequencer: convert to use the_hash_algo
  bisect: switch to using the_hash_algo
  sha1-lookup: switch hard-coded constants to the_hash_algo
  config: use the_hash_algo in abbrev comparison
  combine-diff: replace GIT_SHA1_HEXSZ with the_hash_algo
  bundle: switch to use the_hash_algo
  connected: switch GIT_SHA1_HEXSZ to the_hash_algo
  show-index: switch hard-coded constants to the_hash_algo
  blame: remove needless comparison with GIT_SHA1_HEXSZ
  ...
2019-10-11 14:24:46 +09:00
Junio C Hamano
b9ac6c59b8 Merge branch 'cc/multi-promisor'
Teach the lazy clone machinery that there can be more than one
promisor remote and consult them in order when downloading missing
objects on demand.

* cc/multi-promisor:
  Move core_partial_clone_filter_default to promisor-remote.c
  Move repository_format_partial_clone to promisor-remote.c
  Remove fetch-object.{c,h} in favor of promisor-remote.{c,h}
  remote: add promisor and partial clone config to the doc
  partial-clone: add multiple remotes in the doc
  t0410: test fetching from many promisor remotes
  builtin/fetch: remove unique promisor remote limitation
  promisor-remote: parse remote.*.partialclonefilter
  Use promisor_remote_get_direct() and has_promisor_remote()
  promisor-remote: use repository_format_partial_clone
  promisor-remote: add promisor_remote_reinit()
  promisor-remote: implement promisor_remote_get_direct()
  Add initial support for many promisor remotes
  fetch-object: make functions return an error code
  t0410: remove pipes after git commands
2019-09-18 11:50:09 -07:00
brian m. carlson
dd336a5511 builtin/repack: write object IDs of the proper length
Use the_hash_algo when calling xwrite with a hex object ID so that the
proper amount of data is written.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-19 15:04:58 -07:00
Jeff King
7ff024e7b3 repack: simplify handling of auto-bitmaps and .keep files
Commit 7328482253 (repack: disable bitmaps-by-default if .keep files
exist, 2019-06-29) taught repack to prefer disabling bitmaps to
duplicating objects (unless bitmaps were asked for explicitly).

But there's an easier way to do this: if we keep passing the
--honor-pack-keep flag to pack-objects when auto-enabling bitmaps, then
pack-objects already makes the same decision (it will disable bitmaps
rather than duplicate). Better still, pack-objects can actually decide
to do so based not just on the presence of a .keep file, but on whether
that .keep file actually impacts the new pack we're making (so if we're
racing with a push or fetch, for example, their temporary .keep file
will not block us from generating bitmaps if they haven't yet updated
their refs).

And because repack uses the --write-bitmap-index-quiet flag, we don't
have to worry about pack-objects generating confusing warnings when it
does see a .keep file. We can confirm this by tweaking the .keep test to
check repack's stderr.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-31 13:26:25 -07:00
Jeff King
25575015ca repack: silence warnings when auto-enabled bitmaps cannot be built
Depending on various config options, a full repack may not be able to
build a reachability bitmap index (e.g., if pack.packSizeLimit forces us
to write multiple packs). In these cases pack-objects may write a
warning to stderr.

Since 36eba0323d (repack: enable bitmaps by default on bare repos,
2019-03-14), we may generate these warnings even when the user did not
explicitly ask for bitmaps. This has two downsides:

  - it can be confusing, if they don't know what bitmaps are

  - a daemonized auto-gc will write this to its log file, and the
    presence of the warning may suppress further auto-gc (until
    gc.logExpiry has elapsed)

Let's have repack communicate to pack-objects that the choice to turn on
bitmaps was not made explicitly by the user, which in turn allows
pack-objects to suppress these warnings.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-31 13:15:51 -07:00
Junio C Hamano
d60dc1a0b3 Merge branch 'ew/repack-with-bitmaps-by-default'
Generation of pack bitmaps are now disabled when .keep files exist,
as these are mutually exclusive features.

* ew/repack-with-bitmaps-by-default:
  repack: disable bitmaps-by-default if .keep files exist
2019-07-19 11:30:22 -07:00
Junio C Hamano
4308d81d45 Merge branch 'ds/midx-expire-repack'
"git multi-pack-index" learned expire and repack subcommands.

* ds/midx-expire-repack:
  t5319: use 'test-tool path-utils' instead of 'ls -l'
  t5319-multi-pack-index.sh: test batch size zero
  midx: add test that 'expire' respects .keep files
  multi-pack-index: test expire while adding packs
  midx: implement midx_repack()
  multi-pack-index: prepare 'repack' subcommand
  multi-pack-index: implement 'expire' subcommand
  midx: refactor permutation logic and pack sorting
  midx: simplify computation of pack name lengths
  multi-pack-index: prepare for 'expire' subcommand
  Docs: rearrange subcommands for multi-pack-index
  repack: refactor pack deletion for future use
2019-07-19 11:30:19 -07:00
Junio C Hamano
5cb7c73589 Merge branch 'ds/close-object-store'
The commit-graph file is now part of the "files that the runtime
may keep open file descriptors on, all of which would need to be
closed when done with the object store", and the file descriptor to
an existing commit-graph file now is closed before "gc" finalizes a
new instance to replace it.

* ds/close-object-store:
  packfile: rename close_all_packs to close_object_store
  packfile: close commit-graph in close_all_packs
  commit-graph: use raw_object_store when closing
2019-07-09 15:25:37 -07:00
Eric Wong
7328482253 repack: disable bitmaps-by-default if .keep files exist
Bitmaps aren't useful with multiple packs, and users with
.keep files ended up with redundant packs when bitmaps
got enabled by default in bare repos.

So detect when .keep files exist and stop enabling bitmaps
by default in that case.

Wasteful (but otherwise harmless) race conditions with .keep files
documented by Jeff King still apply and there's a chance we'd
still end up with redundant data on the FS:

  https://public-inbox.org/git/20190623224244.GB1100@sigill.intra.peff.net/

v2: avoid subshell in test case, be multi-index aware

Fixes: 36eba0323d ("repack: enable bitmaps by default on bare repos")
Signed-off-by: Eric Wong <e@80x24.org>
Helped-by: Jeff King <peff@peff.net>
Reported-by: Janos Farkas <chexum@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-01 10:29:08 -07:00
Christian Couder
b14ed5adaf Use promisor_remote_get_direct() and has_promisor_remote()
Instead of using the repository_format_partial_clone global
and fetch_objects() directly, let's use has_promisor_remote()
and promisor_remote_get_direct().

This way all the configured promisor remotes will be taken
into account, not only the one specified by
extensions.partialClone.

Also when cloning or fetching using a partial clone filter,
remote.origin.promisor will be set to "true" instead of
setting extensions.partialClone to "origin". This makes it
possible to use many promisor remote just by fetching from
them.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-25 14:05:37 -07:00
Derrick Stolee
2d511cfc0b packfile: rename close_all_packs to close_object_store
The close_all_packs() method is now responsible for more than just pack-files.
It also closes the commit-graph and the multi-pack-index. Rename the function
to be more descriptive of its larger role. The name also fits because the
input parameter is a raw_object_store.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12 11:33:54 -07:00
Derrick Stolee
8434e85d5f repack: refactor pack deletion for future use
The repack builtin deletes redundant pack-files and their
associated .idx, .promisor, .bitmap, and .keep files. We will want
to re-use this logic in the future for other types of repack, so
pull the logic into 'unlink_pack_path()' in packfile.c.

The 'ignore_keep' parameter is enabled for the use in repack, but
will be important for a future caller.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-11 10:34:40 -07:00
Eric Wong
36eba0323d repack: enable bitmaps by default on bare repos
A typical use case for bare repos is for serving clones and
fetches to clients.  Enable bitmaps by default on bare repos to
make it easier for admins to host git repos in a performant way.

Signed-off-by: Eric Wong <e@80x24.org>
Helped-by: Jeff King <peff@peff.net>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-18 14:09:54 +09:00
Junio C Hamano
5104f8f1ac Merge branch 'js/gc-repack-close-before-remove'
"git gc" and "git repack" did not close the open packfiles that
they found unneeded before removing them, which didn't work on a
platform incapable of removing an open file.  This has been
corrected.

* js/gc-repack-close-before-remove:
  gc/repack: release packs when needed
2019-01-18 13:49:57 -08:00
Johannes Schindelin
5bdece0d70 gc/repack: release packs when needed
On Windows, files cannot be removed nor renamed if there are still
handles held by a process. To remedy that, we introduced the
close_all_packs() function.

Earlier, we made sure that the packs are released just before `git gc`
is spawned, in case that gc wants to remove no-longer needed packs.

But this developer forgot that gc itself also needs to let go of packs,
e.g. when consolidating all packs via the --aggressive option.

Likewise, `git repack -d` wants to delete obsolete packs and therefore
needs to close all pack handles, too.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-11 09:22:39 -08:00
Junio C Hamano
3813a89fae Merge branch 'nd/i18n'
More _("i18n") markings.

* nd/i18n:
  fsck: mark strings for translation
  fsck: reduce word legos to help i18n
  parse-options.c: mark more strings for translation
  parse-options.c: turn some die() to BUG()
  parse-options: replace opterror() with optname()
  repack: mark more strings for translation
  remote.c: mark messages for translation
  remote.c: turn some error() or die() to BUG()
  reflog: mark strings for translation
  read-cache.c: add missing colon separators
  read-cache.c: mark more strings for translation
  read-cache.c: turn die("internal error") to BUG()
  attr.c: mark more string for translation
  archive.c: mark more strings for translation
  alias.c: mark split_cmdline_strerror() strings for translation
  git.c: mark more strings for translation
2019-01-04 13:33:31 -08:00
Junio C Hamano
5fb9263295 Merge branch 'ds/test-multi-pack-index'
Tests for the recently introduced multi-pack index machinery.

* ds/test-multi-pack-index:
  packfile: close multi-pack-index in close_all_packs
  multi-pack-index: define GIT_TEST_MULTI_PACK_INDEX
  midx: close multi-pack-index on repack
  midx: fix broken free() in close_midx()
2018-11-13 22:37:19 +09:00
Nguyễn Thái Ngọc Duy
c83d950e59 repack: mark more strings for translation
Two strings are slightly updated to be consistent with the rest: die()
starts with lowercase.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12 14:47:09 +09:00
Junio C Hamano
ea100b6dcb Merge branch 'js/shallow-and-fetch-prune'
"git repack" in a shallow clone did not correctly update the
shallow points in the repository, leading to a repository that
does not pass fsck.

* js/shallow-and-fetch-prune:
  repack -ad: prune the list of shallow commits
  shallow: offer to prune only non-existing entries
  repack: point out a bug handling stale shallow info
2018-11-06 15:50:18 +09:00
Johannes Schindelin
5dcfbf564c repack -ad: prune the list of shallow commits
`git repack` can drop unreachable commits without further warning,
making the corresponding entries in `.git/shallow` invalid, which causes
serious problems when deepening the branches.

One scenario where unreachable commits are dropped by `git repack` is
when a `git fetch --prune` (or even a `git fetch` when a ref was
force-pushed in the meantime) can make a commit unreachable that was
reachable before.

Therefore it is not safe to assume that a `git repack -adlf` will keep
unreachable commits alone (under the assumption that they had not been
packed in the first place, which is an assumption at least some of Git's
code seems to make).

This is particularly important to keep in mind when looking at the
`.git/shallow` file: if any commits listed in that file become
unreachable, it is not a problem, but if they go missing, it *is* a
problem. One symptom of this problem is that a deepening fetch may now
fail with

	fatal: error in object: unshallow <commit-hash>

To avoid this problem, let's prune the shallow list in `git repack` when
the `-d` option is passed, unless `-A` is passed, too (which would force
the now-unreachable objects to be turned into loose objects instead of
being deleted). Additionally, we also need to take `--keep-reachable`
and `--unpack-unreachable=<date>` into account.

Note: an alternative solution discussed during the review of this patch
was to teach `git fetch` to simply ignore entries in .git/shallow if the
corresponding commits do not exist locally. A quick test, however,
revealed that the .git/shallow file is written during a shallow *clone*,
in which case the commits do not exist, either, but the "shallow" line
*does* need to be sent. Therefore, this approach would be a lot more
finicky than the approach presented by the this patch.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-25 12:59:27 +09:00
Derrick Stolee
0465a50506 multi-pack-index: define GIT_TEST_MULTI_PACK_INDEX
The multi-pack-index feature is tested in isolation by
t5319-multi-pack-index.sh, but there are many more interesting
scenarios in the test suite surrounding pack-file data shapes
and interactions. Since the multi-pack-index is an optional
data structure, it does not make sense to include it by default
in those tests.

Instead, add a new GIT_TEST_MULTI_PACK_INDEX environment variable
that enables core.multiPackIndex and writes a multi-pack-index
after each 'git repack' command. This adds extra test coverage
when needed.

There are a few spots in the test suite that need to react to this
change:

* t5319-multi-pack-index.sh: there is a test that checks that
  'git repack' deletes the multi-pack-index. Disable the environment
  variable to ensure this still happens.

* t5310-pack-bitmaps.sh: One test moves a pack-file from the object
  directory to an alternate. This breaks the multi-pack-index, so
  delete the multi-pack-index at this point, if it exists.

* t9300-fast-import.sh: One test verifies the number of files in
  the .git/objects/pack directory is exactly 8. Exclude the
  multi-pack-index from this count so it is still 8 in all cases.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-22 10:42:46 +09:00
Derrick Stolee
1dcd9f2043 midx: close multi-pack-index on repack
When repacking, we may remove pack-files. This invalidates the
multi-pack-index (if it exists). Previously, we removed the
multi-pack-index file before removing any pack-file. In some cases,
the repack command may load the multi-pack-index into memory. This
may lead to later in-memory references to the non-existent pack-
files.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-22 10:42:46 +09:00
brian m. carlson
2f0c9e9a9b builtin/repack: replace hard-coded constants
Note that while the error messages here are not translated, the end user
should never see them.  We invoke git pack-objects shortly before both
invocations, so we can be fairly certain that the data we're receiving
is in fact valid.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-15 12:53:15 +09:00
Junio C Hamano
f3504ea3dd Merge branch 'cc/delta-islands'
Lift code from GitHub to restrict delta computation so that an
object that exists in one fork is not made into a delta against
another object that does not appear in the same forked repository.

* cc/delta-islands:
  pack-objects: move 'layer' into 'struct packing_data'
  pack-objects: move tree_depth into 'struct packing_data'
  t5320: tests for delta islands
  repack: add delta-islands support
  pack-objects: add delta-islands support
  pack-objects: refactor code into compute_layer_order()
  Add delta-islands.{c,h}
2018-09-17 13:53:55 -07:00
Junio C Hamano
c00ba2233e Sync 'ds/multi-pack-index' to v2.19.0-rc0
* ds/multi-pack-index: (23 commits)
  midx: clear midx on repack
  packfile: skip loading index if in multi-pack-index
  midx: prevent duplicate packfile loads
  midx: use midx in approximate_object_count
  midx: use existing midx when writing new one
  midx: use midx in abbreviation calculations
  midx: read objects from multi-pack-index
  config: create core.multiPackIndex setting
  midx: write object offsets
  midx: write object id fanout chunk
  midx: write object ids in a chunk
  midx: sort and deduplicate objects from packfiles
  midx: read pack names into array
  multi-pack-index: write pack names in chunk
  multi-pack-index: read packfile list
  packfile: generalize pack directory list
  t5319: expand test data
  multi-pack-index: load into memory
  midx: write header information to lockfile
  multi-pack-index: add 'write' verb
  ...
2018-08-20 15:29:54 -07:00
Junio C Hamano
36f0f344e7 Merge branch 'jt/repack-promisor-packs'
After a partial clone, repeated fetches from promisor remote would
have accumulated many packfiles marked with .promisor bit without
getting them coalesced into fewer packfiles, hurting performance.
"git repack" now learned to repack them.

* jt/repack-promisor-packs:
  repack: repack promisor objects if -a or -A is set
  repack: refactor setup of pack-objects cmd
2018-08-20 12:40:31 -07:00
Jeff King
16d75fa48d repack: add delta-islands support
Implement simple support for --delta-islands option and
repack.useDeltaIslands config variable in git repack.

This allows users to setup delta islands in their config and
get the benefit of less disk usage while cloning and fetching
is still quite fast and not much more CPU intensive.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-16 10:51:17 -07:00
Jonathan Tan
5d19e8138d repack: repack promisor objects if -a or -A is set
Currently, repack does not touch promisor packfiles at all, potentially
causing the performance of repositories that have many such packfiles to
drop. Therefore, repack all promisor objects if invoked with -a or -A.

This is done by an additional invocation of pack-objects on all promisor
objects individually given, which takes care of deduplication and allows
the resulting packfiles to respect flags such as --max-pack-size.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-09 09:17:39 -07:00
Jonathan Tan
2b958e790b repack: refactor setup of pack-objects cmd
A subsequent patch will teach repack to run pack-objects with some same
and some different arguments if repacking of promisor objects is
required. Refactor the setup of the pack-objects cmd so that setting up
the arguments common to both is done in a function.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-09 09:17:32 -07:00
Derrick Stolee
525e18c04b midx: clear midx on repack
If a 'git repack' command replaces existing packfiles, then we must
clear the existing multi-pack-index before moving the packfiles it
references.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-20 11:27:29 -07:00
Nguyễn Thái Ngọc Duy
ed7e5fc3a2 repack: add --keep-pack option
We allow to keep existing packs by having companion .keep files. This
is helpful when a pack is permanently kept. In the next patch, git-gc
just wants to keep a pack temporarily, for one pack-objects
run. git-gc can use --keep-pack for this use case.

A note about why the pack_keep field cannot be reused and
pack_keep_in_core has to be added. This is about the case when
--keep-pack is specified together with either --keep-unreachable or
--unpack-unreachable, but --honor-pack-keep is NOT specified.

In this case, we want to exclude objects from the packs specified on
command line, not from ones with .keep files. If only one bit flag is
used, we have to clear pack_keep on pack files with the .keep file.

But we can't make any assumption about unreachable objects in .keep
packs. If "pack_keep" field is false for .keep packs, we could
potentially pull lots of unreachable objects into the new pack, or
unpack them loose. The safer approach is ignore all packs with either
.keep file or --keep-pack.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 13:52:29 +09:00
Jonathan Tan
0c16cd499d gc: do not repack promisor packfiles
Teach gc to stop traversal at promisor objects, and to leave promisor
packfiles alone. This has the effect of only repacking non-promisor
packfiles, and preserves the distinction between promisor packfiles and
non-promisor packfiles.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08 09:52:42 -08:00
Junio C Hamano
f31d23a399 Merge branch 'bw/config-h'
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.

* bw/config-h:
  config: don't implicitly use gitdir or commondir
  config: respect commondir
  setup: teach discover_git_directory to respect the commondir
  config: don't include config.h by default
  config: remove git_config_iter
  config: create config.h
2017-06-24 14:28:41 -07:00
Brandon Williams
b2141fc1d2 config: don't include config.h by default
Stop including config.h by default in cache.h.  Instead only include
config.h in those files which require use of the config system.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-15 12:56:22 -07:00
Junio C Hamano
40bcf3188a repack: accept --threads=<n> and pass it down to pack-objects
We already do so for --window=<n> and --depth=<n>; this will help
when the user wants to force --threads=1 for reproducible testing
without getting affected by racing multiple threads.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-27 08:09:25 +09:00
David Turner
1c409a705c repack: die on incremental + write-bitmap-index
The bitmap index only works for single packs, so requesting an
incremental repack with bitmap indexes makes no sense.

Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-29 13:45:37 -08:00
Junio C Hamano
2703572b3a Merge branch 'va/i18n-even-more'
More markings of messages for i18n, with updates to various tests
to pass GETTEXT_POISON tests.

One patch from the original submission dropped due to conflicts
with jk/upload-pack-hook, which is still in flux.

* va/i18n-even-more: (38 commits)
  t5541: become resilient to GETTEXT_POISON
  i18n: branch: mark comment when editing branch description for translation
  i18n: unmark die messages for translation
  i18n: submodule: escape shell variables inside eval_gettext
  i18n: submodule: join strings marked for translation
  i18n: init-db: join message pieces
  i18n: remote: allow translations to reorder message
  i18n: remote: mark URL fallback text for translation
  i18n: standardise messages
  i18n: sequencer: add period to error message
  i18n: merge: change command option help to lowercase
  i18n: merge: mark messages for translation
  i18n: notes: mark options for translation
  i18n: notes: mark strings for translation
  i18n: transport-helper.c: change N_() call to _()
  i18n: bisect: mark strings for translation
  t5523: use test_i18ngrep for negation
  t4153: fix negated test_i18ngrep call
  t9003: become resilient to GETTEXT_POISON
  tests: unpack-trees: update to use test_i18n* functions
  ...
2016-07-13 11:24:10 -07:00
Vasco Almeida
e923a8abe9 i18n: standardise messages
Standardise messages in order to save translators some work.

Nuances fixed in this commit:
"failed to read %s"
"read of %s failed"

"detach the HEAD at named commit"
"detach HEAD at named commit"

"removing '%s' failed"
"failed to remove '%s'"

"index file corrupt"
"corrupt index file"

"failed to read %s"
"read of %s failed"

"detach the HEAD at named commit"
"detach HEAD at named commit"

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-17 15:46:10 -07:00
Jeff King
e26a8c4721 repack: extend --keep-unreachable to loose objects
If you use "repack -adk" currently, we will pack all objects
that are already packed into the new pack, and then drop the
old packs. However, loose unreachable objects will be left
as-is. In theory these are meant to expire eventually with
"git prune". But if you are using "repack -k", you probably
want to keep things forever and therefore do not run "git
prune" at all. Meaning those loose objects may build up over
time and end up fooling any object-count heuristics (such as
the one done by "gc --auto", though since git-gc does not
support "repack -k", this really applies to whatever custom
scripts people might have driving "repack -k").

With this patch, we instead stuff any loose unreachable
objects into the pack along with the already-packed
unreachable objects. This may seem wasteful, but it is
really no more so than using "repack -k" in the first place.
We are at a slight disadvantage, in that we have no useful
ordering for the result, or names to hand to the delta code.
However, this is again no worse than what "repack -k" is
already doing for the packed objects. The packing of these
objects doesn't matter much because they should not be
accessed frequently (unless they actually _do_ become
referenced, but then they would get moved to a different
part of the packfile during the next repack).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-14 13:57:45 -07:00
Jeff King
905f27b86a repack: add --keep-unreachable option
The usual way to do a full repack (and what is done by
git-gc) is to run "repack -Ad --unpack-unreachable=<when>",
which will loosen any unreachable objects newer than
"<when>", and drop any older ones.

This is a safer alternative to "repack -ad", because
"<when>" becomes a grace period during which we will not
drop any new objects that are about to be referenced.
However, it isn't perfectly safe. It's always possible that
a process is about to reference an old object. Even if that
process were to take care to update the timestamp on the
object, there is no atomicity with a simultaneously running
"repack" process.

So while unlikely, there is a small race wherein we may drop
an object that is in the process of being referenced. If you
do automated repacking on a large number of active
repositories, you may hit it eventually, and the result is a
corrupted repository.

It would be nice to fix that race in the long run, but it's
complicated.  In the meantime, there is a much simpler
strategy for automated repository maintenance: do not drop
objects at all. We already have a "--keep-unreachable"
option in pack-objects; we just need to plumb it through
from git-repack.

Note that this _isn't_ plumbed through from git-gc, so at
this point it's strictly a tool for people doing their own
advanced repository maintenance strategy.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-14 13:57:42 -07:00
Junio C Hamano
8f309aeb82 strbuf: introduce strbuf_getline_{lf,nul}()
The strbuf_getline() interface allows a byte other than LF or NUL as
the line terminator, but this is only because I wrote these
codepaths anticipating that there might be a value other than NUL
and LF that could be useful when I introduced line_termination long
time ago.  No useful caller that uses other value has emerged.

By now, it is clear that the interface is overly broad without a
good reason.  Many codepaths have hardcoded preference to read
either LF terminated or NUL terminated records from their input, and
then call strbuf_getline() with LF or NUL as the third parameter.

This step introduces two thin wrappers around strbuf_getline(),
namely, strbuf_getline_lf() and strbuf_getline_nul(), and
mechanically rewrites these call sites to call either one of
them.  The changes contained in this patch are:

 * introduction of these two functions in strbuf.[ch]

 * mechanical conversion of all callers to strbuf_getline() with
   either '\n' or '\0' as the third parameter to instead call the
   respective thin wrapper.

After this step, output from "git grep 'strbuf_getline('" would
become a lot smaller.  An interim goal of this series is to make
this an empty set, so that we can have strbuf_getline_crlf() take
over the shorter name strbuf_getline().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-15 10:12:51 -08:00
Junio C Hamano
fa46579555 Merge branch 'jk/repository-extension'
Prepare for Git on-disk repository representation to undergo
backward incompatible changes by introducing a new repository
format version "1", with an extension mechanism.

* jk/repository-extension:
  introduce "preciousObjects" repository extension
  introduce "extensions" form of core.repositoryformatversion
2015-10-26 15:55:25 -07:00
Jeff King
e3cf230324 prefer mkpathdup to mkpath in assignments
As with the previous commit to git_path, assigning the
result of mkpath is suspicious, since it is not clear
whether we will still depend on the value after it may have
been overwritten by subsequent calls. This patch converts
low-hanging fruit to use mkpathdup instead of mkpath (with
the downside that we must remember to free the result).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-10 15:37:12 -07:00
Jeff King
067fbd4105 introduce "preciousObjects" repository extension
If this extension is used in a repository, then no
operations should run which may drop objects from the object
storage. This can be useful if you are sharing that storage
with other repositories whose refs you cannot see.

For instance, if you do:

  $ git clone -s parent child
  $ git -C parent config extensions.preciousObjects true
  $ git -C parent config core.repositoryformatversion 1

you now have additional safety when running git in the
parent repository. Prunes and repacks will bail with an
error, and `git gc` will skip those operations (it will
continue to pack refs and do other non-object operations).
Older versions of git, when run in the repository, will
fail on every operation.

Note that we do not set the preciousObjects extension by
default when doing a "clone -s", as doing so breaks
backwards compatibility. It is a decision the user should
make explicitly.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-24 17:09:35 -07:00
Junio C Hamano
68a2e6a2c8 Merge branch 'nd/multiple-work-trees'
A replacement for contrib/workdir/git-new-workdir that does not
rely on symbolic links and make sharing of objects and refs safer
by making the borrowee and borrowers aware of each other.

* nd/multiple-work-trees: (41 commits)
  prune --worktrees: fix expire vs worktree existence condition
  t1501: fix test with split index
  t2026: fix broken &&-chain
  t2026 needs procondition SANITY
  git-checkout.txt: a note about multiple checkout support for submodules
  checkout: add --ignore-other-wortrees
  checkout: pass whole struct to parse_branchname_arg instead of individual flags
  git-common-dir: make "modules/" per-working-directory directory
  checkout: do not fail if target is an empty directory
  t2025: add a test to make sure grafts is working from a linked checkout
  checkout: don't require a work tree when checking out into a new one
  git_path(): keep "info/sparse-checkout" per work-tree
  count-objects: report unused files in $GIT_DIR/worktrees/...
  gc: support prune --worktrees
  gc: factor out gc.pruneexpire parsing code
  gc: style change -- no SP before closing parenthesis
  checkout: clean up half-prepared directories in --to mode
  checkout: reject if the branch is already checked out elsewhere
  prune: strategies for linked checkouts
  checkout: support checking out into a new working directory
  ...
2015-05-11 14:23:39 -07:00
Junio C Hamano
05e816e37f Merge branch 'jk/prune-with-corrupt-refs'
"git prune" used to largely ignore broken refs when deciding which
objects are still being used, which could spread an existing small
damage and make it a larger one.

* jk/prune-with-corrupt-refs:
  refs.c: drop curate_packed_refs
  repack: turn on "ref paranoia" when doing a destructive repack
  prune: turn on ref_paranoia flag
  refs: introduce a "ref paranoia" flag
  t5312: test object deletion code paths in a corrupted repository
2015-03-25 12:54:26 -07:00
Jeff King
8d42299361 repack: turn on "ref paranoia" when doing a destructive repack
If we are repacking with "-ad", we will drop any unreachable
objects. Likewise, using "-Ad --unpack-unreachable=<time>"
will drop any old, unreachable objects. In these cases, we
want to make sure the reachability we compute with "--all"
is complete. We can do this by passing GIT_REF_PARANOIA=1 in
the environment to pack-objects.

Note that "-Ad" is safe already, because it only loosens
unreachable objects. It is up to "git prune" to avoid
deleting them.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-20 12:41:38 -07:00
Alex Henrie
9c9b4f2f8b standardize usage info string format
This patch puts the usage info strings that were not already in docopt-
like format into docopt-like format, which will be a litle easier for
end users and a lot easier for translators. Changes include:

- Placing angle brackets around fill-in-the-blank parameters
- Putting dashes in multiword parameter names
- Adding spaces to [-f|--foobar] to make [-f | --foobar]
- Replacing <foobar>* with [<foobar>...]

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-14 09:32:04 -08:00
Junio C Hamano
a7ddaa8eac Merge branch 'mh/simplify-repack-without-refs'
"git remote update --prune" to drop many refs has been optimized.

* mh/simplify-repack-without-refs:
  sort_string_list(): rename to string_list_sort()
  prune_remote(): iterate using for_each_string_list_item()
  prune_remote(): rename local variable
  repack_without_refs(): make the refnames argument a string_list
  prune_remote(): sort delete_refs_list references en masse
  prune_remote(): initialize both delete_refs lists in a single loop
  prune_remote(): exit early if there are no stale references
2014-12-22 12:26:50 -08:00
Nguyễn Thái Ngọc Duy
dcf692625a path.c: make get_pathname() call sites return const char *
Before the previous commit, get_pathname returns an array of PATH_MAX
length. Even if git_path() and similar functions does not use the
whole array, git_path() caller can, in theory.

After the commit, get_pathname() may return a buffer that has just
enough room for the returned string and git_path() caller should never
write beyond that.

Make git_path(), mkpath() and git_path_submodule() return a const
buffer to make sure callers do not write in it at all.

This could have been part of the previous commit, but the "const"
conversion is too much distraction from the core changes in path.c.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:10 -08:00
Michael Haggerty
3383e19984 sort_string_list(): rename to string_list_sort()
The new name is more consistent with the names of other
string_list-related functions.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-11-25 10:11:34 -08:00
René Scharfe
a2bae2dce1 use args member of struct child_process
Convert users of struct child_process to using the managed argv_array
args instead of providing their own.  This shortens the code a bit and
ensures that the allocated memory is released automatically after use.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-11-10 10:04:13 -08:00
Jeff King
c90f9e13ab repack: pack objects mentioned by the index
When we pack all objects, we use only the objects reachable
from references and reflogs. This misses any objects which
are reachable from the index, but not yet referenced.

By itself this isn't a big deal; the objects can remain
loose until they are actually used in a commit. However, it
does create a problem when we drop packed but unreachable
objects. We try to optimize out the writing of objects that
we will immediately prune, which means we must follow the
same rules as prune in determining what is reachable. And
prune uses the index for this purpose.

This is rather uncommon in practice, as objects in the index
would not usually have been packed in the first place. But
it could happen in a sequence like:

  1. You make a commit on a branch that references blob X.

  2. You repack, moving X into the pack.

  3. You delete the branch (and its reflog), so that X is
     unreferenced.

  4. You "git add" blob X so that it is now referenced only
     by the index.

  5. You repack again with git-gc. The pack-objects we
     invoke will see that X is neither referenced nor
     recent and not bother loosening it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:07:07 -07:00
Junio C Hamano
c0f5f311db Merge branch 'jk/prune-packed-server-info'
Code cleanup.

* jk/prune-packed-server-info:
  repack: call prune_packed_objects() and update_server_info() directly
  server-info: clean up after writing info/packs
  make update-server-info more robust
  prune-packed: fix minor memory leak
2014-09-26 14:39:44 -07:00
René Scharfe
4489a480fd repack: call prune_packed_objects() and update_server_info() directly
Call the functions behind git prune-packed and git update-server-info
directly instead of using run_command().  This is shorter, easier and
quicker.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-15 11:39:58 -07:00
René Scharfe
d318027932 run-command: introduce CHILD_PROCESS_INIT
Most struct child_process variables are cleared using memset first after
declaration.  Provide a macro, CHILD_PROCESS_INIT, that can be used to
initialize them statically instead.  That's shorter, doesn't require a
function call and is slightly more readable (especially given that we
already have STRBUF_INIT, ARGV_ARRAY_INIT etc.).

Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-20 09:53:37 -07:00
Junio C Hamano
6e4094731a Merge branch 'jk/strip-suffix'
* jk/strip-suffix:
  prepare_packed_git_one: refactor duplicate-pack check
  verify-pack: use strbuf_strip_suffix
  strbuf: implement strbuf_strip_suffix
  index-pack: use strip_suffix to avoid magic numbers
  use strip_suffix instead of ends_with in simple cases
  replace has_extension with ends_with
  implement ends_with via strip_suffix
  add strip_suffix function
  sha1_file: replace PATH_MAX buffer with strbuf in prepare_packed_git_one()
2014-07-16 11:26:00 -07:00
Jeff King
26936bfd9b use strip_suffix instead of ends_with in simple cases
When stripping a suffix like:

  if (ends_with(str, "foo"))
	buf = xmemdupz(str, strlen(str) - 3);

we can instead use strip_suffix to avoid the constant 3,
which must match the literal "foo" (we sometimes use
strlen("foo") instead, but that means we are repeating
ourselves). The example above becomes:

  if (strip_suffix(str, "foo", &len))
	buf = xmemdupz(str, len);

This also saves a strlen(), since we calculate the string
length when detecting the suffix.

Note that in some cases we also switch from xstrndup to
xmemdupz, which saves a further strlen call.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-30 13:43:17 -07:00
Junio C Hamano
25f3119000 Merge branch 'jk/repack-pack-writebitmaps-config'
* jk/repack-pack-writebitmaps-config:
  t7700: drop explicit --no-pack-kept-objects from .keep test
  repack: introduce repack.writeBitmaps config option
  repack: simplify handling of --write-bitmap-index
  pack-objects: stop respecting pack.writebitmaps
2014-06-25 12:23:19 -07:00
Junio C Hamano
b7ce583682 Merge branch 'jk/repack-pack-keep-objects'
Recent updates to "git repack" started to duplicate objects that
are in packfiles marked with .keep flag into the new packfile by
mistake.

* jk/repack-pack-keep-objects:
  repack: s/write_bitmap/&s/ in code
  repack: respect pack.writebitmaps
  repack: do not accidentally pack kept objects by default
2014-06-25 12:21:51 -07:00
Jeff King
71d76cb480 repack: introduce repack.writeBitmaps config option
We currently have pack.writeBitmaps, which originally
operated at the pack-objects level. This should really have
been a repack.* option from day one. Let's give it the more
sensible name, but keep the old version as a deprecated
synonym.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-10 14:05:19 -07:00
Jeff King
2bed2d47b4 repack: simplify handling of --write-bitmap-index
We previously needed to pass --no-write-bitmap-index
explicitly to pack-objects to override its reading of
pack.writebitmaps from the config. Now that it no longer
does so, we can assume that bitmaps are off by default, and
only turn them on when necessary. This also lets us avoid a
confusing tri-state flag for write_bitmaps.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-10 14:04:06 -07:00
Jeff King
d078d85bc3 repack: s/write_bitmap/&s/ in code
The config name is "writeBitmaps", so the internal variable
missing the plural is unnecessarily confusing to write.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-10 14:01:30 -07:00
Jeff King
3198b89fb2 repack: respect pack.writebitmaps
The config option to turn on bitmaps is read all the way
down in the plumbing of pack-objects. This makes it hard for
other options in the porcelain of repack to make decisions
based on the bitmap setting. For example,
repack.packKeptObjects tries to kick in by default only when
bitmaps are turned on. But it can't do so reliably because
it doesn't yet know whether we are using bitmaps.

This patch teaches repack to respect pack.writebitmaps. It
means we pass a redundant command-line flag to pack-objects,
but that's OK; it shouldn't affect the outcome.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-10 14:01:08 -07:00
Jeff King
64d3dc9468 repack: do not accidentally pack kept objects by default
Commit ee34a2b (repack: add `repack.packKeptObjects` config
var, 2014-03-03) added a flag which could duplicate kept
objects, but did not mean to turn it on by default. Instead,
the option is tied by default to the decision to write
bitmaps, like:

  if (pack_kept_objects < 0)
	  pack_kept_objects = write_bitmap;

after which we expect pack_kept_objects to be a boolean 0 or
1.  However, that assignment neglects that write_bitmap is
_also_ a tri-state with "-1" as the default, and with
neither option given, we accidentally turn the option on.

This patch is the minimal fix to restore the desired
behavior for the default state. Further patches will fix the
more complicated cases.

Note the update to t7700. It failed to turn on bitmaps,
meaning we were actually confirming the wrong behavior!

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-10 13:58:43 -07:00
Junio C Hamano
249d54b231 Merge branch 'jk/repack-pack-keep-objects'
* jk/repack-pack-keep-objects:
  repack: add `repack.packKeptObjects` config var
2014-03-18 13:50:29 -07:00
Jeff King
ee34a2bead repack: add repack.packKeptObjects config var
The git-repack command always passes `--honor-pack-keep`
to pack-objects. This has traditionally been a good thing,
as we do not want to duplicate those objects in a new pack,
and we are not going to delete the old pack.

However, when bitmaps are in use, it is important for a full
repack to include all reachable objects, even if they may be
duplicated in a .keep pack. Otherwise, we cannot generate
the bitmaps, as the on-disk format requires the set of
objects in the pack to be fully closed.

Even if the repository does not generally have .keep files,
a simultaneous push could cause a race condition in which a
.keep file exists at the moment of a repack. The repack may
try to include those objects in one of two situations:

  1. The pushed .keep pack contains objects that were
     already in the repository (e.g., blobs due to a revert of
     an old commit).

  2. Receive-pack updates the refs, making the objects
     reachable, but before it removes the .keep file, the
     repack runs.

In either case, we may prefer to duplicate some objects in
the new, full pack, and let the next repack (after the .keep
file is cleaned up) take care of removing them.

This patch introduces both a command-line and config option
to disable the `--honor-pack-keep` option.  By default, it
is triggered when pack.writeBitmaps (or `--write-bitmap-index`
is turned on), but specifying it explicitly can override the
behavior (e.g., in cases where you prefer .keep files to
bitmaps, but only when they are present).

Note that this option just disables the pack-objects
behavior. We still leave packs with a .keep in place, as we
do not necessarily know that we have duplicated all of their
objects.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-03 12:21:49 -08:00
Junio C Hamano
0f9e62e084 Merge branch 'jk/pack-bitmap'
Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.

* jk/pack-bitmap: (26 commits)
  ewah: unconditionally ntohll ewah data
  ewah: support platforms that require aligned reads
  read-cache: use get_be32 instead of hand-rolled ntoh_l
  block-sha1: factor out get_be and put_be wrappers
  do not discard revindex when re-preparing packfiles
  pack-bitmap: implement optional name_hash cache
  t/perf: add tests for pack bitmaps
  t: add basic bitmap functionality tests
  count-objects: recognize .bitmap in garbage-checking
  repack: consider bitmaps when performing repacks
  repack: handle optional files created by pack-objects
  repack: turn exts array into array-of-struct
  repack: stop using magic number for ARRAY_SIZE(exts)
  pack-objects: implement bitmap writing
  rev-list: add bitmap mode to speed up object lists
  pack-objects: use bitmaps when packing objects
  pack-objects: split add_object_entry
  pack-bitmap: add support for bitmap indexes
  documentation: add documentation for the bitmap format
  ewah: compressed bitmap implementation
  ...
2014-02-27 14:01:48 -08:00
Junio C Hamano
2171c0c36f Merge branch 'tb/repack-fix-renames' (early part)
Finishing touches to the "rewrite repack in C" series.

* 'tb/repack-fix-renames' (early part):
  repack.c: rename and unlink pack file if it exists
2014-02-05 12:02:29 -08:00
Torsten Bögershausen
9d7fbfd204 repack.c: rename and unlink pack file if it exists
When a repo was fully repacked, and is repacked again, we may run
into the situation that "new" packfiles have the same name as
already existing ones (traditionally packfiles have been named after
the list of names of objects in them, so repacking all the objects
in a single pack would have produced a packfile with the same name).

The logic is to rename the existing ones into filename like
"old-XXX", create the new ones and then remove the "old-" ones.
When something went wrong in the middle, this sequence is rolled
back by renaming the "old-" files back.

The renaming into "old-" did not work as intended, because
file_exists() was done on "XXX", not "pack-XXX".  Also when rolling
back the change, the code tried to rename "old-pack-XXX" but the
saved ones are named "old-XXX", so this couldn't have worked.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-05 11:58:49 -08:00
Junio C Hamano
4110639865 Merge branch 'sb/repack-in-c'
"git repack --max-pack-size=8g" stopped being parsed correctly when
the command was reimplemented in C.

* sb/repack-in-c:
  repack: propagate pack-objects options as strings
  repack: make parsed string options const-correct
  repack: fix typo in max-pack-size option
2014-01-27 10:45:49 -08:00
Jeff King
b861e235bc repack: propagate pack-objects options as strings
In the original shell version of git-repack, any options
destined for pack-objects were left as strings, and passed
as a whole. Since the C rewrite in commit a1bbc6c (repack:
rewrite the shell script in C, 2013-09-15), we now parse
these values to integers internally, then reformat the
integers when passing the option to pack-objects.

This has the advantage that we catch format errors earlier
(i.e., when repack is invoked, rather than when pack-objects
is invoked).

It has three disadvantages, though:

  1. Our internal data types may not be the right size. In
     the case of "--window-memory" and "--max-pack-size",
     these are "unsigned long" in pack-objects, but we can
     only represent a regular "int".

  2. Our parsing routines might not be the same as those of
     pack-objects. For the two options above, pack-objects
     understands "100m" to mean "100 megabytes", but repack
     does not.

  3. We have to keep a sentinel value to know whether it is
     worth passing the option along. In the case of
     "--window-memory", we currently do not pass it if the
     value is "0". But that is a meaningful value to
     pack-objects, where it overrides any configured value.

We can fix all of these by simply passing the strings from
the user along to pack-objects verbatim. This does not
actually fix anything for "--depth" or "--window", but these
are converted, too, for consistency.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-23 10:34:53 -08:00
Jeff King
aa8bd519db repack: make parsed string options const-correct
When we use OPT_STRING to parse an option, we get back a
pointer into the argv array, which should be "const char *".
The compiler doesn't notice because it gets passed through a
"void *" in the option struct.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-23 10:34:51 -08:00
Jeff King
44b96ecaa8 repack: fix typo in max-pack-size option
When we see "--max-pack-size", we accidentally propagated
this to pack-objects as "--max_pack_size", which does not
work at all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-23 10:34:49 -08:00
Vicent Marti
5cf2741c5a repack: consider bitmaps when performing repacks
Since `pack-objects` will write a `.bitmap` file next to the `.pack` and
`.idx` files, this commit teaches `git-repack` to consider the new
bitmap indexes (if they exist) when performing repack operations.

This implies moving old bitmap indexes out of the way if we are
repacking a repository that already has them, and moving the newly
generated bitmap indexes into the `objects/pack` directory, next to
their corresponding packfiles.

Since `git repack` is now capable of handling these `.bitmap` files,
a normal `git gc` run on a repository that has `pack.writebitmaps` set
to true in its config file will generate bitmap indexes as part of the
garbage collection process.

Alternatively, `git repack` can be called with the `-b` switch to
explicitly generate bitmap indexes if you are experimenting
and don't want them on all the time.

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:19:23 -08:00
Jeff King
b77fcd1edc repack: handle optional files created by pack-objects
We ask pack-objects to pack to a set of temporary files, and
then rename them into place. Some files that pack-objects
creates may be optional (like a .bitmap file), in which case
we would not want to call rename(). We already call stat()
and make the chmod optional if the file cannot be accessed.
We could simply skip the rename step in this case, but that
would be a minor regression in noticing problems with
non-optional files (like the .pack and .idx files).

Instead, we can now annotate extensions as optional, and
skip them if they don't exist (and otherwise rely on
rename() to barf).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:19:23 -08:00
Jeff King
42a02d8529 repack: turn exts array into array-of-struct
This is slightly more verbose, but will let us annotate the
extensions with further options in future commits.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:19:23 -08:00
Jeff King
b328c2166e repack: stop using magic number for ARRAY_SIZE(exts)
We have a static array of extensions, but hardcode the size
of the array in our loops. Let's pull out this magic number,
which will make it easier to change.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:19:23 -08:00
Junio C Hamano
7794a680e6 Sync with 1.8.5.2
* maint:
  Git 1.8.5.2
  cmd_repack(): remove redundant local variable "nr_packs"
2013-12-17 14:12:17 -08:00
Michael Haggerty
3e7b066e22 cmd_repack(): remove redundant local variable "nr_packs"
Its value is the same as the number of entries in the "names"
string_list, so just use "names.nr" in its place.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Acked-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-17 10:54:41 -08:00
Christian Couder
5955654823 replace {pre,suf}fixcmp() with {starts,ends}_with()
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.

The change can be recreated by mechanically applying this:

    $ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
      grep -v strbuf\\.c |
      xargs perl -pi -e '
        s|!prefixcmp\(|starts_with\(|g;
        s|prefixcmp\(|!starts_with\(|g;
        s|!suffixcmp\(|ends_with\(|g;
        s|suffixcmp\(|!ends_with\(|g;
      '

on the result of preparatory changes in this series.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05 14:13:21 -08:00
Stefan Beller
0b63c6a5b7 repack: improve warnings about failure of renaming and removing files
Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-17 13:34:57 -07:00
Stefan Beller
ffc9329f48 repack: retain the return value of pack-objects
During the review process of the previous commit (repack: rewrite the
shell script in C), Johannes Sixt proposed to retain any exit codes from
the sub-process, which makes it probably more obvious in case of failure.

As the commit before should behave as close to the original shell
script, the proposed change is put in this extra commit.
The infrastructure however was already setup in the previous commit.
(Having a local 'ret' variable)

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-17 13:34:56 -07:00
Stefan Beller
a1bbc6c017 repack: rewrite the shell script in C
The motivation of this patch is to get closer to a goal of being
able to have a core subset of git functionality built in to git.
That would mean

 * people on Windows could get a copy of at least the core parts
   of Git without having to install a Unix-style shell

 * people using git in on servers with chrooted environments
   do not need to worry about standard tools lacking for shell
   scripts.

This patch is meant to be mostly a literal translation of the
git-repack script; the intent is that later patches would start using
more library facilities, but this patch is meant to be as close to a
no-op as possible so it doesn't do that kind of thing.

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-17 13:34:50 -07:00