Commit graph

13 commits

Author SHA1 Message Date
Jeff King
e26a8c4721 repack: extend --keep-unreachable to loose objects
If you use "repack -adk" currently, we will pack all objects
that are already packed into the new pack, and then drop the
old packs. However, loose unreachable objects will be left
as-is. In theory these are meant to expire eventually with
"git prune". But if you are using "repack -k", you probably
want to keep things forever and therefore do not run "git
prune" at all. Meaning those loose objects may build up over
time and end up fooling any object-count heuristics (such as
the one done by "gc --auto", though since git-gc does not
support "repack -k", this really applies to whatever custom
scripts people might have driving "repack -k").

With this patch, we instead stuff any loose unreachable
objects into the pack along with the already-packed
unreachable objects. This may seem wasteful, but it is
really no more so than using "repack -k" in the first place.
We are at a slight disadvantage, in that we have no useful
ordering for the result, or names to hand to the delta code.
However, this is again no worse than what "repack -k" is
already doing for the packed objects. The packing of these
objects doesn't matter much because they should not be
accessed frequently (unless they actually _do_ become
referenced, but then they would get moved to a different
part of the packfile during the next repack).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-14 13:57:45 -07:00
Jeff King
905f27b86a repack: add --keep-unreachable option
The usual way to do a full repack (and what is done by
git-gc) is to run "repack -Ad --unpack-unreachable=<when>",
which will loosen any unreachable objects newer than
"<when>", and drop any older ones.

This is a safer alternative to "repack -ad", because
"<when>" becomes a grace period during which we will not
drop any new objects that are about to be referenced.
However, it isn't perfectly safe. It's always possible that
a process is about to reference an old object. Even if that
process were to take care to update the timestamp on the
object, there is no atomicity with a simultaneously running
"repack" process.

So while unlikely, there is a small race wherein we may drop
an object that is in the process of being referenced. If you
do automated repacking on a large number of active
repositories, you may hit it eventually, and the result is a
corrupted repository.

It would be nice to fix that race in the long run, but it's
complicated.  In the meantime, there is a much simpler
strategy for automated repository maintenance: do not drop
objects at all. We already have a "--keep-unreachable"
option in pack-objects; we just need to plumb it through
from git-repack.

Note that this _isn't_ plumbed through from git-gc, so at
this point it's strictly a tool for people doing their own
advanced repository maintenance strategy.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-14 13:57:42 -07:00
Jeff King
76e057dba2 t7701: fix ignored exit code inside loop
When checking a list of file mtimes, we use a loop and break
out early from the loop if any entry does not match.
However, the exit code of a loop exited via break is always
0, meaning that the test will fail to notice we had a
mismatch. Since the loop is inside a function, we can fix
this by doing an early "return 1".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-25 10:24:13 -07:00
Jeff King
c90f9e13ab repack: pack objects mentioned by the index
When we pack all objects, we use only the objects reachable
from references and reflogs. This misses any objects which
are reachable from the index, but not yet referenced.

By itself this isn't a big deal; the objects can remain
loose until they are actually used in a commit. However, it
does create a problem when we drop packed but unreachable
objects. We try to optimize out the writing of objects that
we will immediately prune, which means we must follow the
same rules as prune in determining what is reachable. And
prune uses the index for this purpose.

This is rather uncommon in practice, as objects in the index
would not usually have been packed in the first place. But
it could happen in a sequence like:

  1. You make a commit on a branch that references blob X.

  2. You repack, moving X into the pack.

  3. You delete the branch (and its reflog), so that X is
     unreferenced.

  4. You "git add" blob X so that it is now referenced only
     by the index.

  5. You repack again with git-gc. The pack-objects we
     invoke will see that X is neither referenced nor
     recent and not bother loosening it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:07:07 -07:00
Jeff King
7e52f5660e gc: do not explode objects which will be immediately pruned
When we pack everything into one big pack with "git repack
-Ad", any unreferenced objects in to-be-deleted packs are
exploded into loose objects, with the intent that they will
be examined and possibly cleaned up by the next run of "git
prune".

Since the exploded objects will receive the mtime of the
pack from which they come, if the source pack is old, those
loose objects will end up pruned immediately. In that case,
it is much more efficient to skip the exploding step
entirely for these objects.

This patch teaches pack-objects to receive the expiration
information and avoid writing these objects out. It also
teaches "git gc" to pass the value of gc.pruneexpire to
repack (which in turn learns to pass it along to
pack-objects) so that this optimization happens
automatically during "git gc" and "git gc --auto".

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-11 11:09:49 -07:00
Junio C Hamano
713c79e84b more war on "sleep" in tests
Two more tests that sleep only to waste tick can be converted to use
test_tick and take expiry parameters relative to $test_tick.  The basic
idea is to replace "sleep 1" with "test_tick" to cause the "time" to pass.

These tests are interested in expiring things with "now" as the timestamp,
soo use a timestamp relative to $test_tick to give them more stability and
reproducibility.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-17 18:20:23 -07:00
Johannes Sixt
d04099382b Windows: Fix intermittent failures of t7701
The last test case checks whether unpacked objects receive the time stamp
of the pack file. Due to different implementations of stat(2) by MSYS and
our version in compat/mingw.c, the test fails in about half of the test
runs.

Note the following facts:

- The test uses perl's -M operator to compare the time stamps. Since we
  depend on MSYS perl, the result of this operator is based on MSYS's
  implementation of the stat(2) call.

- NTFS on Windows records fractional seconds.

- The MSYS implementation of stat(2) *rounds* fractional seconds to full
  seconds instead of truncating them. This becomes obvious by comparing the
  modification times reported by 'ls --full-time $f' and 'stat $f' for
  various files $f.

- Our implementation of stat(2) in compat/mingw.c *truncates* to full
  seconds.

The consequence of this is that

- add_packed_git() picks up a truncated whole second modification time
  from the pack file time stamp, which is then used for the loose objects,
  while the pack file retains its time stamp in fractional seconds;

- but the test case compared the pack file's rounded modification times
  to the loose objects' truncated modification times.

And half of the time the rounded modification time is not the same as its
truncated modification time.

The fix is that we replace perl by 'test-chmtime -v +0', which prints the
truncated whole-second mtime without modifying it.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-28 10:31:04 -08:00
Junio C Hamano
c9d8563c81 Merge branch 'bc/maint-keep-pack'
* bc/maint-keep-pack:
  repack: only unpack-unreachable if we are deleting redundant packs
2008-11-16 00:49:02 -08:00
Brandon Casey
83d0289df6 repack: only unpack-unreachable if we are deleting redundant packs
The -A option calls pack-objects with the --unpack-unreachable option so
that the unreachable objects in local packs are left in the local object
store loose. But if the -d option to repack was _not_ used, then these
unpacked loose objects are redundant and unnecessary.

Update tests in t7701.

Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-14 21:39:10 -08:00
Jeff King
87539416fd tests: grep portability fixes
We try to avoid using the "-q" or "-e" options, as they are
largely useless, as explained in aadbe44f.

There is one exception for "-e" here, which is in t7701 used
to produce an "or" of patterns. This can be rewritten as an
egrep pattern.

This patch also removes use of "grep -F" in favor of the
more widely available "fgrep".

[sp: Tested on AIX 5.3 by Mike Ralphson,
     Tested on MinGW by Johannes Sixt]

Signed-off-by: Jeff King <peff@peff.net>
Tested-by: Mike Ralphson <mike@abacus.co.uk>
Tested-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-09-30 12:39:58 -07:00
Nanako Shiraishi
47a528ad24 tests: use "git xyzzy" form (t7200 - t9001)
Converts tests between t7201-t9001.

Signed-off-by: Nanako Shiraishi <nanako3@lavabit.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-09-03 14:51:48 -07:00
Brandon Casey
e896912c5e t7701-repack-unpack-unreachable.sh: check timestamp of unpacked objects
Unpacked objects should receive the timestamp of the pack they were
unpacked from. Check.

Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-08 16:56:07 -07:00
Brandon Casey
ccc1297226 repack: modify behavior of -A option to leave unreferenced objects unpacked
The previous behavior of the -A option was to retain any previously
packed objects which had become unreferenced, and place them into the newly
created pack file.  Since git-gc, when run automatically with the --auto
option, calls repack with the -A option, this had the effect of retaining
unreferenced packed objects indefinitely. To avoid this scenario, the
user was required to run git-gc with the little known --prune option or
to manually run repack with the -a option.

This patch changes the behavior of the -A option so that unreferenced
objects that exist in any pack file being replaced, will be unpacked into
the repository. The unreferenced loose objects can then be garbage collected
by git-gc (i.e. git-prune) based on the gc.pruneExpire setting.

Also add new tests for checking whether unreferenced objects which were
previously packed are properly left in the repository unpacked after
repacking.

Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-11 11:24:48 -07:00