Commit graph

77 commits

Author SHA1 Message Date
Patrick Steinhardt b1df3b3867 commit-graph: disable GIT_COMMIT_GRAPH_PARANOIA by default
In 7a5d604443 (commit: detect commits that exist in commit-graph but not
in the ODB, 2023-10-31), we have introduced a new object existence check
into `repo_parse_commit_internal()` so that we do not parse commits via
the commit-graph that don't have a corresponding object in the object
database. This new check of course comes with a performance penalty,
which the commit put at around 30% for `git rev-list --topo-order`. But
there are in fact scenarios where the performance regression is even
higher. The following benchmark against linux.git with a fully-build
commit-graph:

  Benchmark 1: git.v2.42.1 rev-list --count HEAD
    Time (mean ± σ):     658.0 ms ±   5.2 ms    [User: 613.5 ms, System: 44.4 ms]
    Range (min … max):   650.2 ms … 666.0 ms    10 runs

  Benchmark 2: git.v2.43.0-rc1 rev-list --count HEAD
    Time (mean ± σ):      1.333 s ±  0.019 s    [User: 1.263 s, System: 0.069 s]
    Range (min … max):    1.302 s …  1.361 s    10 runs

  Summary
    git.v2.42.1 rev-list --count HEAD ran
      2.03 ± 0.03 times faster than git.v2.43.0-rc1 rev-list --count HEAD

While it's a noble goal to ensure that results are the same regardless
of whether or not we have a potentially stale commit-graph, taking twice
as much time is a tough sell. Furthermore, we can generally assume that
the commit-graph will be updated by git-gc(1) or git-maintenance(1) as
required so that the case where the commit-graph is stale should not at
all be common.

With that in mind, default-disable GIT_COMMIT_GRAPH_PARANOIA and restore
the behaviour and thus performance previous to the mentioned commit. In
order to not be inconsistent, also disable this behaviour by default in
`lookup_commit_in_graph()`, where the object existence check has been
introduced right at its inception via f559d6d45e (revision: avoid
hitting packfiles when commits are in commit-graph, 2021-08-09).

This results in another speedup in commands that end up calling this
function, even though it's less pronounced compared to the above
benchmark. The following has been executed in linux.git with ~1.2
million references:

  Benchmark 1: GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted
    Time (mean ± σ):      2.947 s ±  0.003 s    [User: 2.412 s, System: 0.534 s]
    Range (min … max):    2.943 s …  2.949 s    3 runs

  Benchmark 2: GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted
    Time (mean ± σ):      2.724 s ±  0.030 s    [User: 2.207 s, System: 0.514 s]
    Range (min … max):    2.704 s …  2.759 s    3 runs

  Summary
    GIT_COMMIT_GRAPH_PARANOIA=false git rev-list --all --no-walk=unsorted ran
      1.08 ± 0.01 times faster than GIT_COMMIT_GRAPH_PARANOIA=true git rev-list --all --no-walk=unsorted

So whereas 7a5d604443 initially introduced the logic to start doing an
object existence check in `repo_parse_commit_internal()` by default, the
updated logic will now instead cause `lookup_commit_in_graph()` to stop
doing the check by default. This behaviour continues to be tweakable by
the user via the GIT_COMMIT_GRAPH_PARANOIA environment variable.

Note that this requires us to amend some tests to manually turn on the
paranoid checks again. This is because we cause repository corruption by
manually deleting objects which are part of the commit graph already.
These circumstances shouldn't usually happen in repositories.

Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-26 10:10:00 +09:00
Junio C Hamano 79861babe2 Merge branch 'tb/repack-max-cruft-size'
"git repack" learned "--max-cruft-size" to prevent cruft packs from
growing without bounds.

* tb/repack-max-cruft-size:
  repack: free existing_cruft array after use
  builtin/repack.c: avoid making cruft packs preferred
  builtin/repack.c: implement support for `--max-cruft-size`
  builtin/repack.c: parse `--max-pack-size` with OPT_MAGNITUDE
  t7700: split cruft-related tests to t7704
2023-10-18 13:25:41 -07:00
Taylor Blau 78de1c6c32 t7700: split cruft-related tests to t7704
A small handful of the tests in t7700 (the main script for testing
functionality of 'git repack') are specifically related to cruft pack
operations.

Prepare for adding new cruft pack-related tests by moving the existing
set into a new test script.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 18:28:47 -07:00
Christian Couder 71c5aec1f5 repack: implement --filter-to for storing filtered out objects
A previous commit has implemented `git repack --filter=<filter-spec>` to
allow users to filter out some objects from the main pack and move them
into a new different pack.

It would be nice if this new different pack could be created in a
different directory than the regular pack. This would make it possible
to move large blobs into a pack on a different kind of storage, for
example cheaper storage.

Even in a different directory, this pack can be accessible if, for
example, the Git alternates mechanism is used to point to it. In fact
not using the Git alternates mechanism can corrupt a repo as the
generated pack containing the filtered objects might not be accessible
from the repo any more. So setting up the Git alternates mechanism
should be done before using this feature if the user wants the repo to
be fully usable while this feature is used.

In some cases, like when a repo has just been cloned or when there is no
other activity in the repo, it's Ok to setup the Git alternates
mechanism afterwards though. It's also Ok to just inspect the generated
packfile containing the filtered objects and then just move it into the
'.git/objects/pack/' directory manually. That's why it's not necessary
for this command to check that the Git alternates mechanism has been
already setup.

While at it, as an example to show that `--filter` and `--filter-to`
work well with other options, let's also add a test to check that these
options work well with `--max-pack-size`.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 14:54:31 -07:00
Christian Couder 48a9b67b43 repack: add --filter=<filter-spec> option
This new option puts the objects specified by `<filter-spec>` into a
separate packfile.

This could be useful if, for example, some blobs take up a lot of
precious space on fast storage while they are rarely accessed. It could
make sense to move them into a separate cheaper, though slower, storage.

It's possible to find which new packfile contains the filtered out
objects using one of the following:

  - `git verify-pack -v ...`,
  - `test-tool find-pack ...`, which a previous commit added,
  - `--filter-to=<dir>`, which a following commit will add to specify
    where the pack containing the filtered out objects will be.

This feature is implemented by running `git pack-objects` twice in a
row. The first command is run with `--filter=<filter-spec>`, using the
specified filter. It packs objects while omitting the objects specified
by the filter. Then another `git pack-objects` command is launched using
`--stdin-packs`. We pass it all the previously existing packs into its
stdin, so that it will pack all the objects in the previously existing
packs. But we also pass into its stdin, the pack created by the previous
`git pack-objects --filter=<filter-spec>` command as well as the kept
packs, all prefixed with '^', so that the objects in these packs will be
omitted from the resulting pack. The result is that only the objects
filtered out by the first `git pack-objects` command are in the pack
resulting from the second `git pack-objects` command.

As the interactions with kept packs are a bit tricky, a few related
tests are added.

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 14:54:30 -07:00
Derrick Stolee 0af067276e builtin/repack.c: only repack .packs that exist
In 73320e49ad (builtin/repack.c: only collect fully-formed packs,
2023-06-07), we switched the check for which packs to collect by
starting at the .idx files and looking for matching .pack files. This
avoids trying to repack pack-files that have not had their pack-indexes
installed yet.

However, it does cause maintenance to halt if we find the (problematic,
but not insurmountable) case of a .idx file without a corresponding
.pack file. In an environment where packfile maintenance is a critical
function, such a hard stop is costly and requires human intervention to
resolve (by deleting the .idx file).

This was not the case before. We successfully repacked through this
scenario until the recent change to scan for .idx files.

Further, if we are actually in a case where objects are missing, we
detect this at a different point during the reachability walk.

In other cases, Git prepares its list of packfiles by scanning .idx
files and then only adds it to the packfile list if the corresponding
.pack file exists. It even does so without a warning! (See
add_packed_git() in packfile.c for details.)

This case is much less likely to occur than the failures seen before
73320e49ad. Packfiles are "installed" by writing the .pack file before
the .idx and that process can be interrupted. Packfiles _should_ be
deleted by deleting the .idx first, followed by the .pack file, but
unlink_pack_path() does not do this: it deletes the .pack _first_,
allowing a window where this process could be interrupted. We leave the
consideration of changing this order as a separate concern. Knowing that
this condition is possible from interrupted Git processes and not other
tools lends some weight that Git should be more flexible around this
scenario.

Add a check to see if the .pack file exists before adding it to the list
for repacking. This will stop a number of maintenance failures seen in
production but fixed by deleting the .idx files.

This brings us closer to the case before 73320e49ad in that 'git
repack' will not fail when there is an orphaned .idx file, at least, not
due to the way we scan for packfiles. In the case that the .pack file
was erroneously deleted without copies of its objects in other installed
packfiles, then 'git repack' will fail due to the reachable object walk.

This does resolve the case where automated repacks will no longer be
halted on this case. The tests in t7700 show both these successful
scenarios and the case of failing if the .pack was truly required.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-11 13:07:50 -07:00
Junio C Hamano e224f26892 Merge branch 'tb/collect-pack-filenames-fix'
Avoid breakage of "git pack-objects --cruft" due to inconsistency
between the way the code enumerates packfiles in the repository.

* tb/collect-pack-filenames-fix:
  builtin/repack.c: only collect fully-formed packs
2023-06-26 09:29:50 -07:00
Taylor Blau 73320e49ad builtin/repack.c: only collect fully-formed packs
To partition the set of packs based on which ones are "kept" (either
they have a .keep file, or were otherwise marked via the `--keep-pack`
option) and "non-kept" ones (anything else), `git repack` uses its
`collect_pack_filenames()` function.

Ordinarily, we would rely on a convenience function such as
`get_all_packs()` to enumerate and partition the set of packs. But
`collect_pack_filenames()` uses `readdir()` directly to read the
contents of the "$GIT_DIR/objects/pack" directory, and adds each entry
ending in ".pack" to the appropriate list (either kept, or non-kept as
above).

This is subtly racy, since `collect_pack_filenames()` may see a pack
that is not fully staged (i.e., it is missing its ".idx" file).
Ordinarily, this doesn't cause a problem. But it can cause issues when
generating a cruft pack.

This is because `git repack` feeds (among other things) the list of
existing kept packs down to `git pack-objects --cruft` to indicate that
any kept packs will not be removed from the repository (so that the
cruft pack machinery can avoid packing objects that appear in those
packs as cruft).

But `read_cruft_objects()` lists packfiles by calling `get_all_packs()`.
So if a ".pack" file exists (necessary to get that pack to appear to
`collect_pack_filenames()`), but doesn't have a corresponding ".idx"
file (necessary to get that pack to appear via `get_all_packs()`), we'll
complain with:

    fatal: could not find pack '.tmp-5841-pack-a6b0150558609c323c496ced21de6f4b66589260.pack'

Fix the above by teaching `collect_pack_filenames()` to only collect
packs with their corresponding `*.idx` files in place, indicating that
those packs have been fully staged.

There are a couple of things worth noting:

  - Since each entry in the `extra_keep` list (which contains the
    `--keep-pack` names) has a `*.pack` suffix, we'll have to swap the
    suffix from ".pack" to ".idx", and compare that instead.

  - Since we use the the `fname_kept_list` to figure out which packs to
    delete (with `git repack -d`), we would have previously deleted a
    `*.pack` with no index (since the existince of a ".pack" file is
    necessary and sufficient to include that pack in the list of
    existing non-kept packs).

    Now we will leave it alone (since that pack won't appear in the
    list). This is far more correct behavior, since we don't want
    to race with a pack being staged. Deleting a partially staged pack
    is unlikely, however, since the window of time between staging a
    pack and moving its .idx file into place is miniscule.

    Note that this window does *not* include the time it takes to
    receive and index the pack, since the incoming data goes into
    "$GIT_DIR/objects/tmp_pack_XXXXXX", which does not end in ".pack"
    and is thus ignored by collect_pack_filenames().

In the future, this function should probably be rewritten as a callback
to `for_each_file_in_pack_dir()`, but this is the simplest change we
could do in the short-term.

Reported-by: Michael Haggerty <mhagger@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-12 13:54:38 -07:00
John Cai 3a3b98be91 t7700-repack: modernize test format
Some tests still use the old format with four spaces indentation.
Standardize the tests to the new format with tab indentation.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-23 12:54:29 +09:00
Patrick Steinhardt d85cd18777 repack: disable writing bitmaps when doing a local repack
In order to write a bitmap, we need to have full coverage of all objects
that are about to be packed. In the traditional non-multi-pack-index
world this meant we need to do a full repack of all objects into a
single packfile. But in the new multi-pack-index world we can get away
with writing bitmaps when we have multiple packfiles as long as the
multi-pack-index covers all objects.

This is not always the case though. When asked to perform a repack of
local objects, only, then we cannot guarantee to have full coverage of
all objects regardless of whether we do a full repack or a repack with a
multi-pack-index. The end result is that writing the bitmap will fail in
both worlds:

    $ git multi-pack-index write --stdin-packs --bitmap <packfiles
    warning: Failed to write bitmap index. Packfile doesn't have full closure (object 1529341d78cf45377407369acb0f4ff2b5cdae42 is missing)
    error: could not write multi-pack bitmap

Now there are two different ways to fix this. The first one would be to
amend git-multi-pack-index(1) to disable writing bitmaps when we notice
that we don't have full object coverage.

    - We don't have enough information in git-multi-pack-index(1) in
      order to tell whether the local repository _should_ have full
      coverage. Because even when connected to an alternate object
      directory, it may be the case that we still have all objects
      around in the main object database.

    - git-multi-pack-index(1) is quite a low-level tool. Automatically
      disabling functionality that it was asked to provide does not feel
      like the right thing to do.

We can easily fix it at a higher level in git-repack(1) though. When
asked to only include local objects via `-l` and when connected to an
alternate object directory then we will override the user's ask and
disable writing bitmaps with a warning. This is similar to what we do in
git-pack-objects(1), where we also disable writing bitmaps in case we
omit an object from the pack.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 10:27:52 -07:00
Junio C Hamano f32996d99a Merge branch 'gc/resolve-alternate-symlinks'
Resolve symbolic links when processing the locations of alternate
object stores, since failing to do so can lead to confusing and buggy
behavior.

* gc/resolve-alternate-symlinks:
  object-file: use real paths when adding alternates
2022-11-29 10:41:05 +09:00
Glen Choo 199337d6ec object-file: use real paths when adding alternates
When adding an alternate ODB, we check if the alternate has the same
path as the object dir, and if so, we do nothing. However, that
comparison does not resolve symlinks. This makes it possible to add the
object dir as an alternate, which may result in bad behavior. For
example, it can trick "git repack -a -l -d" (possibly run by "git gc")
into thinking that all packs come from an alternate and delete all
objects.

	rm -rf test &&
	git clone https://github.com/git/git test &&
	(
	cd test &&
	ln -s objects .git/alt-objects &&
	# -c repack.updateserverinfo=false silences a warning about not
	# being able to update "info/refs", it isn't needed to show the
	# bad behavior
	GIT_ALTERNATE_OBJECT_DIRECTORIES=".git/alt-objects" git \
		-c repack.updateserverinfo=false repack -a -l -d  &&
	# It's broken!
	git status
	# Because there are no more objects!
	ls .git/objects/pack
	)

Fix this by resolving symlinks and relative paths before comparing the
alternate and object dir. This lets us clean up a number of issues noted
in 37a95862c6 (alternates: re-allow relative paths from environment,
2016-11-07):

- Now that we compare the real paths, duplicate detection is no longer
  foiled by relative paths.
- Using strbuf_realpath() allows us to "normalize" paths that
  strbuf_normalize_path() can't, so we can stop silently ignoring errors
  when "normalizing" paths from the environment.
- We now store an absolute path based on getcwd() (the "future
  direction" named in 37a95862c6), so chdir()-ing in the process no
  longer changes the directory pointed to by the alternate. This is a
  change in behavior, but a desirable one.

Signed-off-by: Glen Choo <chooglen@google.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-11-25 09:44:08 +09:00
Taylor Blau ad9096881d Merge branch 'tb/repack-expire-to'
"git repack" learns to send cruft objects out of the way into
packfiles outside the repository.

* tb/repack-expire-to:
  builtin/repack.c: implement `--expire-to` for storing pruned objects
  builtin/repack.c: write cruft packs to arbitrary locations
  builtin/repack.c: pass "cruft_expiration" to `write_cruft_pack`
  builtin/repack.c: pass "out" to `prepare_pack_objects`
2022-11-18 18:43:09 -05:00
Taylor Blau c88895e67b Merge branch 'jk/repack-tempfile-cleanup'
The way "git repack" creared temporary files when it received a
signal was prone to deadlocking, which has been corrected.

* jk/repack-tempfile-cleanup:
  t7700: annotate cruft-pack failure with ok=sigpipe
  repack: drop remove_temporary_files()
  repack: use tempfiles for signal cleanup
  repack: expand error message for missing pack files
  repack: populate extension bits incrementally
  repack: convert "names" util bitfield to array
2022-10-30 21:04:42 -04:00
Junio C Hamano c5dd7773e1 Merge branch 'tb/remove-unused-pack-bitmap'
When creating a multi-pack bitmap, remove per-pack bitmap files
unconditionally as they will never be consulted.

* tb/remove-unused-pack-bitmap:
  builtin/repack.c: remove redundant pack-based bitmaps
2022-10-28 11:26:54 -07:00
Taylor Blau 91badeba32 builtin/repack.c: implement --expire-to for storing pruned objects
When pruning objects with `--cruft`, `git repack` offers some
flexibility when selecting the set of which objects are pruned via the
`--cruft-expiration` option.

This is useful for expiring objects which are older than the grace
period, making races where to-be-pruned objects become reachable and
then ancestors of freshly pushed objects, leaving the repository in a
corrupt state after pruning substantially less likely [1].

But in practice, such races are impossible to avoid entirely, no matter
how long the grace period is. To prevent this race, it is often
advisable to temporarily put a repository into a read-only state. But in
practice, this is not always practical, and so some middle ground would
be nice.

This patch introduces a new option, `--expire-to`, which teaches `git
repack` to write an additional cruft pack containing just the objects
which were pruned from the repository. The caller can specify a
directory outside of the current repository as the destination for this
second cruft pack.

This makes it possible to prune objects from a repository, while still
holding onto a supplemental copy of them outside of the original
repository. Having this copy on-disk makes it substantially easier to
recover objects when the aforementioned race is encountered.

`--expire-to` is implemented in a somewhat convoluted manner, which is
to take advantage of the fact that the first time `write_cruft_pack()`
is called, it adds the name of the cruft pack to the `names` string
list. That means the second time we call `write_cruft_pack()`, objects
in the previously-written cruft pack will be excluded.

As long as the caller ensures that no objects are expired during the
second pass, this is sufficient to generate a cruft pack containing all
objects which don't appear in any of the new packs written by `git
repack`, including the cruft pack. In other words, all of the objects
which are about to be pruned from the repository.

It is important to note that the destination in `--expire-to` does not
necessarily need to be a Git repository (though it can be) Notably, the
expired packs do not contain all ancestors of expired objects. So if the
source repository contains something like:

              <unreachable>
             /
    C1 --- C2
      \
       refs/heads/master

where C2 is unreachable, but has a parent (C1) which is reachable, and
C2 would be pruned, then the expiry pack will contain only C2, not C1.

[1]: https://lore.kernel.org/git/20190319001829.GL29661@sigill.intra.peff.net/

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24 13:39:42 -07:00
Jeff King 9b3fadfd06 t7700: annotate cruft-pack failure with ok=sigpipe
One of our tests intentionally causes the cruft-pack generation phase of
repack to fail, in order to stimulate an exit from repack at the desired
moment. It does so by feeding a bogus option argument to pack-objects.
This is a simple and reliable way to get pack-objects to fail, but it
has one downside: pack-objects will die before reading its stdin, which
means the caller repack may racily get SIGPIPE writing to it.

For the purposes of this test, that's OK. We are checking whether repack
cleans up already-created .tmp files, and it will do so whether it exits
or dies by signal (because the tempfile API hooks both).

But we have to tell test_must_fail that either outcome is OK, or it
complains about the signal. Arguably this is a workaround (compared to
fixing repack), as repack dying to SIGPIPE means that it loses the
opportunity to give a more detailed message. But we don't actually write
such a message anyway; we rely on pack-objects to have written something
useful to stderr, and it does. In either case (signal or exit), that is
the main thing the user will see.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-23 11:08:45 -07:00
Jeff King 193430717a repack: drop remove_temporary_files()
After we've successfully finished the repack, we call
remove_temporary_files(), which looks for and removes any files matching
".tmp-$$-pack-*", where $$ is the pid of the current process. But this
is pointless. If we make it this far in the process, we've already
renamed these tempfiles into place, and there is nothing left to delete.

Nor is there a point in trying to call it to clean up when we _aren't_
successful. It's not safe for using in a signal handler, and the
previous commit already handed that job over to the tempfile API.

It might seem like it would be useful to clean up stray .tmp files left
by other invocations of git-repack. But it won't clean those files; it
only matches ones with its pid, and leaves the rest. Fortunately, those
are cleaned up naturally by successive calls to git-repack; we'll
consider .tmp-*.pack the same as normal packfiles, so "repack -ad", etc,
will roll up their contents and eventually delete them.

The one case that could matter is if pack-objects generates an extension
we don't know about, like ".tmp-pack-$$-$hash.some-new-ext". The current
code will quietly delete such a file, while after this patch we'd leave
it in place. In practice this doesn't happen, and would be indicative of
a bug. Leaving the file as cruft is arguably a better behavior, as it
means somebody is more likely to eventually notice and fix the bug.  If
we really wanted to be paranoid, we could scan for and warn about such
files, but that seems like overkill.

There's nothing to test with regard to the removal of this function. It
was doing nothing, so the behavior should be the same.  However, we can
verify (and protect) our assumption that "repack -ad" will eventually
remove stray files by adding a test for that.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-21 18:03:52 -07:00
Jeff King 9cf10d8786 repack: use tempfiles for signal cleanup
When git-repack exits due to a signal, it tries to clean up by calling
its remove_temporary_files() function, which walks through the packs dir
looking for ".tmp-$$-pack-*" files to delete (where "$$" is the pid of
the current process).

The biggest problem here is that remove_temporary_files() is not safe to
call in a signal handler. It uses opendir(), which isn't on the POSIX
async-signal-safe list. The details will be platform-specific, but a
likely issue is that it needs to allocate memory; if we receive a signal
while inside malloc(), etc, we'll conflict on the allocator lock and
deadlock with ourselves.

We can fix this by just cleaning up the files directly, without walking
the directory. We already know the complete list of .tmp-* files that
were generated, because we recorded them via populate_pack_exts(). When
we find files there, we can use register_tempfile() to record the
filenames. If we receive a signal, then the tempfile API will clean them
up for us, and it's async-safe and pretty battle-tested.

Note that this is slightly racier than the existing scheme. We don't
record the filenames until pack-objects tells us the hash over stdout.
So during the period between it generating the file and reporting the
hash, we'd fail to clean up. However, that period is very small. During
most of the pack generation process pack-objects is using its own
internal tempfiles. It's only at the very end that it moves them into
the names git-repack expects, and then it immediately reports the name
to us. Given that cleanup like this is best effort (after all, we may
get SIGKILL), this level of race is acceptable.

When we register the tempfiles, we'll record them locally and use the
result to call rename_tempfile(), rather than renaming by hand.  This
isn't strictly necessary, as once we've renamed the files they're gone,
and the tempfile API's cleanup unlink() would simply become a pointless
noop. But managing the lifetimes of the tempfile objects is the cleanest
thing to do, and the tempfile pointers naturally fill the same role as
the old booleans.

This patch also fixes another small problem. We only hook signals, and
don't set up an atexit handler. So if we see an error that causes us to
die(), we'll leave the .tmp-* files in place. But since the tempfile API
handles this for us, this is now fixed for free. The new test covers
this by stimulating a failure of pack-objects when generating a cruft
pack. Before this patch, the .tmp-* file for the main pack would have
been left, but now we correctly clean it up.

Two small subtleties on the implementation:

  - in the renaming loop, we can stop re-constructing fname_old; we only
    use it when we have a tempfile to rename, so we can just ask the
    tempfile for its path (which, barring bugs, should be identical)

  - when renaming fails, our error message mentions fname_old. But since
    a failed rename_tempfile() invalidates the tempfile struct, we'll
    lose access to that string. Instead, let's mention the destination
    filename, which is what most other callers do.

Reported-by: Jan Pokorný <poki@fnusa.cz>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-21 18:03:52 -07:00
Taylor Blau 197443e80a repack: don't remove .keep packs with --pack-kept-objects
`git repack` supports a `--pack-kept-objects` flag which more or less
translates to whether or not we pass `--honor-pack-keep` down to `git
pack-objects` when assembling a new pack.

This behavior has existed since ee34a2bead (repack: add
`repack.packKeptObjects` config var, 2014-03-03). In that commit, the
documentation was extended to say:

    [...] Note that we still do not delete `.keep` packs after
    `pack-objects` finishes.

Unfortunately, this is not the case when `--pack-kept-objects` is
combined with a `--geometric` repack. When doing a geometric repack, we
include `.keep` packs when enumerating available packs only when
`pack_kept_objects` is set.

So this all works fine when `--no-pack-kept-objects` (or similar) is
given. Kept packs are excluded from the geometric roll-up, so when we go
to delete redundant packs (with `-d`), no `.keep` packs appear "below
the split" in our geometric progression.

But when `--pack-kept-objects` is given, things can go awry. Namely,
when a kept pack is included in the list of packs tracked by the
`pack_geometry` struct *and* part of the pack roll-up, we will delete
the `.keep` pack when we shouldn't.

Note that this *doesn't* result in object corruption, since the `.keep`
pack's objects are still present in the new pack. But the `.keep` pack
itself is removed, which violates our promise from back in ee34a2bead.

But there's more. Because `repack` computes the geometric roll-up
independently from selecting which packs belong in a MIDX (with
`--write-midx`), this can lead to odd behavior. Consider when a `.keep`
pack appears below the geometric split (ie., its objects will be part of
the new pack we generate).

We'll write a MIDX containing the new pack along with the existing
`.keep` pack. But because the `.keep` pack appears below the geometric
split line, we'll (incorrectly) try to remove it. While this doesn't
corrupt the repository, it does cause us to remove the MIDX we just
wrote, since removing that pack would invalidate the new MIDX.

Funny enough, this behavior became far less noticeable after e4d0c11c04
(repack: respect kept objects with '--write-midx -b', 2021-12-20), which
made `pack_kept_objects` be enabled by default only when we were writing
a non-MIDX bitmap.

But e4d0c11c04 didn't resolve this bug, it just made it harder to notice
unless callers explicitly passed `--pack-kept-objects`.

The solution is to avoid trying to remove `.keep` packs during
`--geometric` repacks, even when they appear below the geometric split
line, which is the approach this patch implements.

Co-authored-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-17 21:29:23 -07:00
Taylor Blau 55d902cd61 builtin/repack.c: remove redundant pack-based bitmaps
When we write a MIDX bitmap after repacking, it is possible that the
repository would be left in a state with both pack- and multi-pack
reachability bitmaps.

This can occur, for instance, if a pack that was kept (either by having
a .keep file, or during a geometric repack in which it is not rolled up)
has a bitmap file, and the repack wrote a multi-pack index and bitmap.

When loading a reachability bitmap for the repository, the multi-pack
one is always preferred, so the pack-based one is redundant. Let's
remove it unconditionally, even if '-d' isn't passed, since there is no
practical reason to keep both around. The patch below does just that.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-17 21:26:16 -07:00
Derrick Stolee f1486203f5 t7700: check post-condition in kept-pack test
The '--write-midx -b packs non-kept objects' test in t7700-repack.sh
uses test_subcommand_inexact to check that 'git repack' properly adds
the '--honor-pack-keep' flag to the 'git pack-objects' subcommand.
However, the test_subcommand_inexact helper is more flexible than
initially designed, and this instance is the only one that makes use of
it: there are additional arguments between 'git pack-objects' and the
'--honor-pack-keep' flag. In order to make test_subcommand_inexact more
strict, we need to fix this instance.

This test checks that 'git repack --write-midx -a -b -d' will create a
new pack-file that does not contain the objects within the kept pack.
This behavior is possible because of the multi-pack-index bitmap that
will bitmap objects against multiple packs. Without --write-midx, the
objects in the kept pack would be duplicated so the resulting pack is
closed under reachability and bitmaps can be created against it. This is
discussed in more detail in e4d0c11c0 (repack: respect kept objects with
'--write-midx -b', 2021-12-20) which also introduced this instance of
test_subcommand_inexact.

To better verify the intended post-conditions while also removing this
instance of test_subcommand_inexact, rewrite the test to check the list
of packed objects in the kept pack and the list of the objects in the
newly-repacked pack-file _other_ than the kept pack. These lists should
be disjoint.

Be sure to include a non-kept pack-file and loose objects to be extra
careful that this is properly behaving with kept packs and not just
avoiding repacking all pack-files.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-25 13:14:59 -07:00
Junio C Hamano bfce3e7b92 Merge branch 'ps/repack-with-server-info'
"git repack" learned a new configuration to disable triggering of
age-old "update-server-info" command, which is rarely useful these
days.

* ps/repack-with-server-info:
  repack: add config to skip updating server info
  repack: refactor to avoid double-negation of update-server-info
2022-03-23 14:09:30 -07:00
Patrick Steinhardt a2565c48e4 repack: add config to skip updating server info
By default, git-repack(1) will update server info that is required by
the dumb HTTP transport. This can be skipped by passing the `-n` flag,
but what we're noticably missing is a config option to permanently
disable updating this information.

Add a new option "repack.updateServerInfo" which can be used to disable
the logic. Most hosting providers have turned off the dumb HTTP protocol
anyway, and on the client-side it woudln't typically be useful either.
Giving a persistent way to disable this feature thus makes quite some
sense to avoid wasting compute cycles and storage.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-14 22:25:13 +00:00
Patrick Steinhardt 64a6151da7 repack: refactor to avoid double-negation of update-server-info
By default, git-repack(1) runs `update_server_info()` to generate info
required for the dumb HTTP protocol. This can be disabled via the `-n`
flag, which then sets the `no_update_server_info` flag. Further down the
code this leads to some double-negation logic, which is about to become
more confusing as we're about to add a new config which allows the user
to permanently disable generation of the info.

Refactor the code to avoid the double-negation and add some tests which
verify that the flag continues to work as expected.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-14 22:24:59 +00:00
Junio C Hamano f2cb46a6b3 Merge branch 'tb/midx-bitmap-corruption-fix'
A bug that made multi-pack bitmap and the object order out-of-sync,
making the .midx data corrupt, has been fixed.

* tb/midx-bitmap-corruption-fix:
  pack-bitmap.c: gracefully fallback after opening pack/MIDX
  midx: read `RIDX` chunk when present
  t/lib-bitmap.sh: parameterize tests over reverse index source
  t5326: move tests to t/lib-bitmap.sh
  t5326: extract `test_rev_exists`
  t5326: drop unnecessary setup
  pack-revindex.c: instrument loading on-disk reverse index
  midx.c: make changing the preferred pack safe
  t5326: demonstrate bitmap corruption after permutation
2022-02-16 15:14:29 -08:00
Taylor Blau 7f514b7a5e midx: read RIDX chunk when present
When a MIDX contains the new `RIDX` chunk, ensure that the reverse index
is read from it instead of the on-disk .rev file. Since we need to
encode the object order in the MIDX itself for correctness reasons,
there is no point in storing the same data again outside of the MIDX.

So, this patch stops writing separate .rev files, and reads it out of
the MIDX itself. This is possible to do with relatively little new code,
since the format of the RIDX chunk is identical to the data in the .rev
file. In other words, we can implement this by pointing the
`revindex_data` field at the reverse index chunk of the MIDX instead of
the .rev file without any other changes.

Note that we have two knobs that are adjusted for the new tests:
GIT_TEST_MIDX_WRITE_REV and GIT_TEST_MIDX_READ_RIDX. The former controls
whether the MIDX .rev is written at all, and the latter controls whether
we read the MIDX's RIDX chunk.

Both are necessary to ensure that the test added at the beginning of
this series continues to work. This is because we always need to write
the RIDX chunk in the MIDX in order to change its checksum, but we want
to make sure reading the existing .rev file still works (since the RIDX
chunk takes precedence by default).

Arguably this isn't a very interesting mode to test, because the
precedence rules mean that we'll always read the RIDX chunk over the
.rev file. But it makes it impossible for a user to induce corruption in
their repository by adjusting the test knobs (since if we had an
either/or knob they could stop writing the RIDX chunk, allowing them to
tweak the MIDX's object order without changing its checksum).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-27 12:07:53 -08:00
Junio C Hamano 88a516aca0 Merge branch 'ds/repack-fixlets'
Two fixes around "git repack".

* ds/repack-fixlets:
  repack: make '--quiet' disable progress
  repack: respect kept objects with '--write-midx -b'
2022-01-05 14:01:30 -08:00
Derrick Stolee 47ca93d071 repack: make '--quiet' disable progress
While testing some ideas in 'git repack', I ran it with '--quiet' and
discovered that some progress output was still shown. Specifically, the
output for writing the multi-pack-index showed the progress.

The 'show_progress' variable in cmd_repack() is initialized with
isatty(2) and is not modified at all by the '--quiet' flag. The
'--quiet' flag modifies the po_args.quiet option which is translated
into a '--quiet' flag for the 'git pack-objects' child process. However,
'show_progress' is used to directly send progress information to the
multi-pack-index writing logic which does not use a child process.

The fix here is to modify 'show_progress' to be false if po_opts.quiet
is true, and isatty(2) otherwise. This new expectation simplifies a
later condition that checks both.

Update the documentation to make it clear that '-q' will disable all
progress in addition to ensuring the 'git pack-objects' child process
will receive the flag.

Use 'test_terminal' to check that this works to get around the isatty(2)
check.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-20 11:59:17 -08:00
Derrick Stolee e4d0c11c04 repack: respect kept objects with '--write-midx -b'
Historically, we needed a single packfile in order to have reachability
bitmaps. This introduced logic that when 'git repack' had a '-b' option
that we should stop sending the '--honor-pack-keep' option to the 'git
pack-objects' child process, ensuring that we create a packfile
containing all reachable objects.

In the world of multi-pack-index bitmaps, we no longer need to repack
all objects into a single pack to have valid bitmaps. Thus, we should
continue sending the '--honor-pack-keep' flag to 'git pack-objects'.

The fix is very simple: only disable the flag when writing bitmaps but
also _not_ writing the multi-pack-index.

This opens the door to new repacking strategies that might want to keep
some historical set of objects in a stable pack-file while only
repacking more recent objects.

To test, create a new 'test_subcommand_inexact' helper that is more
flexible than 'test_subcommand'. This allows us to look for the
--honor-pack-keep flag without over-indexing on the exact set of
arguments.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-20 11:58:31 -08:00
Eric Sunshine 0c51d6b4ae t6000-t9999: detect and signal failure within loop
Failures within `for` and `while` loops can go unnoticed if not detected
and signaled manually since the loop itself does not abort when a
contained command fails, nor will a failure necessarily be detected when
the loop finishes since the loop returns the exit code of the last
command it ran on the final iteration, which may not be the command
which failed. Therefore, detect and signal failures manually within
loops using the idiom `|| return 1` (or `|| exit 1` within subshells).

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13 10:29:48 -08:00
Taylor Blau 324efc90d1 builtin/repack.c: pass --refs-snapshot when writing bitmaps
To prevent the race described in an earlier patch, generate and pass a
reference snapshot to the multi-pack bitmap code, if we are writing one
from `git repack`.

This patch is mostly limited to creating a temporary file, and then
calling for_each_ref(). Except we try to minimize duplicates, since
doing so can drastically reduce the size in network-of-forks style
repositories. In the kernel's fork network (the repository containing
all objects from the kernel and all its forks), deduplicating the
references drops the snapshot size from 934 MB to just 12 MB.

But since we're handling duplicates in this way, we have to make sure
that we preferred references (those listed in pack.preferBitmapTips)
before non-preferred ones (to avoid recording an object which is pointed
at by a preferred tip as non-preferred).

We accomplish this by doing separate passes over the references: first
visiting each prefix in pack.preferBitmapTips, and then over the rest of
the references.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 16:40:09 -07:00
Taylor Blau 1d89d88d37 builtin/repack.c: support writing a MIDX while repacking
Teach `git repack` a new `--write-midx` option for callers that wish to
persist a multi-pack index in their repository while repacking.

There are two existing alternatives to this new flag, but they don't
cover our particular use-case. These alternatives are:

  - Call 'git multi-pack-index write' after running 'git repack', or

  - Set 'GIT_TEST_MULTI_PACK_INDEX=1' in your environment when running
    'git repack'.

The former works, but introduces a gap in bitmap coverage between
repacking and writing a new MIDX (since the repack may have deleted a
pack included in the existing MIDX, invalidating it altogether).

Setting the 'GIT_TEST_' environment variable is obviously unsupported.
In fact, even if it were supported officially, it still wouldn't work,
because it generates the MIDX *after* redundant packs have been dropped,
leading to the same issue as above.

Introduce a new option which eliminates this race by teaching `git
repack` to generate the MIDX at the critical point: after the new packs
have been written and moved into place, but before the redundant packs
have been removed.

This option is compatible with `git repack`'s '--bitmap' option (it
changes the interpretation to be: "write a bitmap corresponding to the
MIDX after one has been generated").

There is a little bit of additional noise in the patch below to avoid
repeating ourselves when selecting which packs to delete. Instead of a
single loop as before (where we iterate over 'existing_packs', decide if
a pack is worth deleting, and if so, delete it), we have two loops (the
first where we decide which ones are worth deleting, and the second
where we actually do the deleting). This makes it so we have a single
check we can make consistently when (1) telling the MIDX which packs we
want to exclude, and (2) actually unlinking the redundant packs.

There is also a tiny change to short-circuit the body of
write_midx_included_packs() when no packs remain in the case of an empty
repository. The MIDX code does not handle this, so avoid trying to
generate a MIDX covering zero packs in the first place.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-28 21:20:56 -07:00
Taylor Blau 4b58b6f7b7 t7700: update to work with MIDX bitmap test knob
A number of these tests are focused only on pack-based bitmaps and need
to be updated to disable 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' where
necessary.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 13:56:43 -07:00
Denton Liu d9b31db2c4 t7700: stop losing return codes of git commands
In a pipe, only the return code of the last command is used. Thus, all
other commands will have their return codes masked. Rewrite pipes so
that there are no git commands upstream so that we will know if a
command fails.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-04 14:25:05 -08:00
Denton Liu 3699d69df0 t7700: make references to SHA-1 generic
Make the test more hash-agnostic by renaming variables from "sha1" to
some variation of "oid" or "packid". Also, replace the regex,
`[0-9a-f]\{40\}` with `$OID_REGEX`.

A better name for "incrpackid" (incremental pack-id) might have been
just "packid". However, later in the test suite, we have other uses of
"packid". Although the scopes of these variables don't conflict, a
future developer may think that commit_and_pack() and
test_has_duplicate_object() are semantically related somehow since they
share the same variable name. Give them distinct names so that it's
clear these uses are unrelated.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-04 14:25:05 -08:00
Denton Liu dcf9a748ca t7700: replace egrep with grep
The egrep expressions in this test suite were of the form `^$variable`.
Although egrep works just fine, it's overkill since we're not using any
extended regex. Replace egrep invocations with grep so that we aren't
swatting flies with a sledgehammer.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-04 14:25:05 -08:00
Denton Liu cfe5eda02a t7700: consolidate code into test_has_duplicate_object()
The code to test that objects were not duplicated from the packfile was
duplicated many times. Extract the duplicated code into
test_has_duplicate_object() and use that instead.

Refactor the resulting extraction so that if the git command fails,
the return code is not silently lost.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-04 14:25:05 -08:00
Denton Liu ae475afc0f t7700: consolidate code into test_no_missing_in_packs()
The code to test that objects were not missing from the packfile was
duplicated many times. Extract the duplicated code into
test_no_missing_in_packs() and use that instead.

Refactor the resulting extraction so that if any git commands fail,
their return codes are not silently lost.

Instead of verifying each file of `alt_objects/pack/*.idx` individually
in a for-loop, batch them together into one verification step.

The original testing construct was O(n^2): it used a grep in a loop to
test whether any objects were missing in the packfile. Rewrite this to
extract the hash using sed or cut, sort the files, then use `comm -23`
so that finding missing lines from the original file is done more
efficiently.

While we're at it, add a space to `commit_and_pack ()` for style.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-04 14:25:05 -08:00
Denton Liu 17a4ae92ea t7700: s/test -f/test_path_is_file/
Since we have debugging-friendly alternatives to `test -f`, replace
instances of `test -f` with `test_path_is_file` so that if a command
ever fails, we get better debugging information.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-29 13:20:15 -08:00
Denton Liu d2eee32a89 t7700: move keywords onto their own line
The code style for tests is to have statements on their own line if
possible. Move keywords onto their own line so that they conform with
the test style.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-29 13:20:15 -08:00
Denton Liu 7a1c8c2346 t7700: remove spaces after redirect operators
For shell scripts, the usual convention is for there to be no space
after redirection operators, (e.g. `>file`, not `> file`). Remove these
spaces wherever they appear.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-29 13:20:15 -08:00
Denton Liu 09279086e8 t7700: drop redirections to /dev/null
Since output is silenced when running without `-v` and debugging output
is useful with `-v`, remove redirections to /dev/null as it is not
useful.

In one case where the output of stdout is consumed, redirect the output
of test_commit to stderr.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-29 13:20:15 -08:00
Jeff King 7ff024e7b3 repack: simplify handling of auto-bitmaps and .keep files
Commit 7328482253 (repack: disable bitmaps-by-default if .keep files
exist, 2019-06-29) taught repack to prefer disabling bitmaps to
duplicating objects (unless bitmaps were asked for explicitly).

But there's an easier way to do this: if we keep passing the
--honor-pack-keep flag to pack-objects when auto-enabling bitmaps, then
pack-objects already makes the same decision (it will disable bitmaps
rather than duplicate). Better still, pack-objects can actually decide
to do so based not just on the presence of a .keep file, but on whether
that .keep file actually impacts the new pack we're making (so if we're
racing with a push or fetch, for example, their temporary .keep file
will not block us from generating bitmaps if they haven't yet updated
their refs).

And because repack uses the --write-bitmap-index-quiet flag, we don't
have to worry about pack-objects generating confusing warnings when it
does see a .keep file. We can confirm this by tweaking the .keep test to
check repack's stderr.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-31 13:26:25 -07:00
Jeff King 25575015ca repack: silence warnings when auto-enabled bitmaps cannot be built
Depending on various config options, a full repack may not be able to
build a reachability bitmap index (e.g., if pack.packSizeLimit forces us
to write multiple packs). In these cases pack-objects may write a
warning to stderr.

Since 36eba0323d (repack: enable bitmaps by default on bare repos,
2019-03-14), we may generate these warnings even when the user did not
explicitly ask for bitmaps. This has two downsides:

  - it can be confusing, if they don't know what bitmaps are

  - a daemonized auto-gc will write this to its log file, and the
    presence of the warning may suppress further auto-gc (until
    gc.logExpiry has elapsed)

Let's have repack communicate to pack-objects that the choice to turn on
bitmaps was not made explicitly by the user, which in turn allows
pack-objects to suppress these warnings.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-31 13:15:51 -07:00
Jeff King cc2649ae2d t7700: clean up .keep file in bitmap-writing test
After our test snippet finishes, the .keep file is left in place, making
it hard to do further tests of the auto-bitmap-writing code (since it
suppresses the feature completely). Let's clean it up.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-31 13:14:04 -07:00
Eric Wong 7328482253 repack: disable bitmaps-by-default if .keep files exist
Bitmaps aren't useful with multiple packs, and users with
.keep files ended up with redundant packs when bitmaps
got enabled by default in bare repos.

So detect when .keep files exist and stop enabling bitmaps
by default in that case.

Wasteful (but otherwise harmless) race conditions with .keep files
documented by Jeff King still apply and there's a chance we'd
still end up with redundant data on the FS:

  https://public-inbox.org/git/20190623224244.GB1100@sigill.intra.peff.net/

v2: avoid subshell in test case, be multi-index aware

Fixes: 36eba0323d ("repack: enable bitmaps by default on bare repos")
Signed-off-by: Eric Wong <e@80x24.org>
Helped-by: Jeff King <peff@peff.net>
Reported-by: Janos Farkas <chexum@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-07-01 10:29:08 -07:00
Eric Wong 36eba0323d repack: enable bitmaps by default on bare repos
A typical use case for bare repos is for serving clones and
fetches to clients.  Enable bitmaps by default on bare repos to
make it easier for admins to host git repos in a performant way.

Signed-off-by: Eric Wong <e@80x24.org>
Helped-by: Jeff King <peff@peff.net>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-18 14:09:54 +09:00
Nguyễn Thái Ngọc Duy ed7e5fc3a2 repack: add --keep-pack option
We allow to keep existing packs by having companion .keep files. This
is helpful when a pack is permanently kept. In the next patch, git-gc
just wants to keep a pack temporarily, for one pack-objects
run. git-gc can use --keep-pack for this use case.

A note about why the pack_keep field cannot be reused and
pack_keep_in_core has to be added. This is about the case when
--keep-pack is specified together with either --keep-unreachable or
--unpack-unreachable, but --honor-pack-keep is NOT specified.

In this case, we want to exclude objects from the packs specified on
command line, not from ones with .keep files. If only one bit flag is
used, we have to clear pack_keep on pack files with the .keep file.

But we can't make any assumption about unreachable objects in .keep
packs. If "pack_keep" field is false for .keep packs, we could
potentially pull lots of unreachable objects into the new pack, or
unpack them loose. The safer approach is ignore all packs with either
.keep file or --keep-pack.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 13:52:29 +09:00
Nguyễn Thái Ngọc Duy e9e33ab0fb t7700: have closing quote of a test at the beginning of line
The closing quote of a test body by convention is always at the start
of line.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 13:52:29 +09:00