git/Documentation/git-multi-pack-index.txt

139 lines
4.5 KiB
Text
Raw Normal View History

git-multi-pack-index(1)
=======================
NAME
----
git-multi-pack-index - Write and verify multi-pack-indexes
SYNOPSIS
--------
[verse]
'git multi-pack-index' [--object-dir=<dir>] [--[no-]bitmap] <sub-command>
DESCRIPTION
-----------
Write or verify a multi-pack-index (MIDX) file.
OPTIONS
-------
--object-dir=<dir>::
Use given directory for the location of Git objects. We check
`<dir>/packs/multi-pack-index` for the current MIDX file, and
`<dir>/packs` for the pack-files to index.
midx: avoid opening multiple MIDXs when writing Opening multiple instance of the same MIDX can lead to problems like two separate packed_git structures which represent the same pack being added to the repository's object store. The above scenario can happen because prepare_midx_pack() checks if `m->packs[pack_int_id]` is NULL in order to determine if a pack has been opened and installed in the repository before. But a caller can construct two copies of the same MIDX by calling get_multi_pack_index() and load_multi_pack_index() since the former manipulates the object store directly but the latter is a lower-level routine which allocates a new MIDX for each call. So if prepare_midx_pack() is called on multiple MIDXs with the same pack_int_id, then that pack will be installed twice in the object store's packed_git pointer. This can lead to problems in, for e.g., the pack-bitmap code, which does something like the following (in pack-bitmap.c:open_pack_bitmap()): struct bitmap_index *bitmap_git = ...; for (p = get_all_packs(r); p; p = p->next) { if (open_pack_bitmap_1(bitmap_git, p) == 0) ret = 0; } which is a problem if two copies of the same pack exist in the packed_git list because pack-bitmap.c:open_pack_bitmap_1() contains a conditional like the following: if (bitmap_git->pack || bitmap_git->midx) { /* ignore extra bitmap file; we can only handle one */ warning("ignoring extra bitmap file: %s", packfile->pack_name); close(fd); return -1; } Avoid this scenario by not letting write_midx_internal() open a MIDX that isn't also pointed at by the object store. So long as this is the case, other routines should prefer to open MIDXs with get_multi_pack_index() or reprepare_packed_git() instead of creating instances on their own. Because get_multi_pack_index() returns `r->object_store->multi_pack_index` if it is non-NULL, we'll only have one instance of a MIDX open at one time, avoiding these problems. To encourage this, drop the `struct multi_pack_index *` parameter from `write_midx_internal()`, and rely instead on the `object_dir` to find (or initialize) the correct MIDX instance. Likewise, replace the call to `close_midx()` with `close_object_store()`, since we're about to replace the MIDX with a new one and should invalidate the object store's memory of any MIDX that might have existed beforehand. Note that this now forbids passing object directories that don't belong to alternate repositories over `--object-dir`, since before we would have happily opened a MIDX in any directory, but now restrict ourselves to only those reachable by `r->objects->multi_pack_index` (and alternate MIDXs that we can see by walking the `next` pointer). As far as I can tell, supporting arbitrary directories with `--object-dir` was a historical accident, since even the documentation says `<alt>` when referring to the value passed to this option. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-01 20:34:01 +00:00
+
`<dir>` must be an alternate of the current repository.
--[no-]progress::
Turn progress on/off explicitly. If neither is specified, progress is
shown if standard error is connected to a terminal. Supported by
sub-commands `write`, `verify`, `expire`, and `repack.
The following subcommands are available:
write::
Write a new MIDX file. The following options are available for
the `write` sub-command:
+
--
--preferred-pack=<pack>::
Optionally specify the tie-breaking pack used when
multiple packs contain the same object. `<pack>` must
contain at least one object. If not given, ties are
broken in favor of the pack with the lowest mtime.
--[no-]bitmap::
Control whether or not a multi-pack bitmap is written.
--stdin-packs::
Write a multi-pack index containing only the set of
line-delimited pack index basenames provided over stdin.
midx: preliminary support for `--refs-snapshot` To figure out which commits we can write a bitmap for, the multi-pack index/bitmap code does a reachability traversal, marking any commit which can be found in the MIDX as eligible to receive a bitmap. This approach will cause a problem when multi-pack bitmaps are able to be generated from `git repack`, since the reference tips can change during the repack. Even though we ignore commits that don't exist in the MIDX (when doing a scan of the ref tips), it's possible that a commit in the MIDX reaches something that isn't. This can happen when a multi-pack index contains some pack which refers to loose objects (e.g., if a pack was pushed after starting the repack but before generating the MIDX which depends on an object which is stored as loose in the repository, and by definition isn't included in the multi-pack index). By taking a snapshot of the references before we start repacking, we can close that race window. In the above scenario (where we have a packed object pointing at a loose one), we'll either (a) take a snapshot of the references before seeing the packed one, or (b) take it after, at which point we can guarantee that the loose object will be packed and included in the MIDX. This patch does just that. It writes a temporary "reference snapshot", which is a list of OIDs that are at the ref tips before writing a multi-pack bitmap. References that are "preferred" (i.e,. are a suffix of at least one value of the 'pack.preferBitmapTips' configuration) are marked with a special '+'. The format is simple: one line per commit at each tip, with an optional '+' at the beginning (for preferred references, as described above). When provided, the reference snapshot is used to drive bitmap selection instead of the MIDX code doing its own traversal. When it isn't provided, the usual traversal takes place instead. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-29 01:55:07 +00:00
--refs-snapshot=<path>::
With `--bitmap`, optionally specify a file which
contains a "refs snapshot" taken prior to repacking.
+
A reference snapshot is composed of line-delimited OIDs corresponding to
the reference tips, usually taken by `git repack` prior to generating a
new pack. A line may optionally start with a `+` character to indicate
that the reference which corresponds to that OID is "preferred" (see
linkgit:git-config[1]'s `pack.preferBitmapTips`.)
+
The file given at `<path>` is expected to be readable, and can contain
duplicates. (If a given OID is given more than once, it is marked as
preferred if at least one instance of it begins with the special `+`
marker).
--
verify::
Verify the contents of the MIDX file.
expire::
Delete the pack-files that are tracked by the MIDX file, but
have no objects referenced by the MIDX (with the exception of
`.keep` packs and cruft packs). Rewrite the MIDX file afterward
to remove all references to these pack-files.
multi-pack-index: prepare 'repack' subcommand In an environment where the multi-pack-index is useful, it is due to many pack-files and an inability to repack the object store into a single pack-file. However, it is likely that many of these pack-files are rather small, and could be repacked into a slightly larger pack-file without too much effort. It may also be important to ensure the object store is highly available and the repack operation does not interrupt concurrent git commands. Introduce a 'repack' subcommand to 'git multi-pack-index' that takes a '--batch-size' option. The subcommand will inspect the multi-pack-index for referenced pack-files whose size is smaller than the batch size, until collecting a list of pack-files whose sizes sum to larger than the batch size. Then, a new pack-file will be created containing the objects from those pack-files that are referenced by the multi-pack-index. The resulting pack is likely to actually be smaller than the batch size due to compression and the fact that there may be objects in the pack- files that have duplicate copies in other pack-files. The current change introduces the command-line arguments, and we add a test that ensures we parse these options properly. Since we specify a small batch size, we will guarantee that future implementations do not change the list of pack-files. In addition, we hard-code the modified times of the packs in the pack directory to ensure the list of packs sorted by modified time matches the order if sorted by size (ascending). This will be important in a future test. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-10 23:35:26 +00:00
repack::
Create a new pack-file containing objects in small pack-files
referenced by the multi-pack-index. If the size given by the
`--batch-size=<size>` argument is zero, then create a pack
containing all objects referenced by the multi-pack-index. For
a non-zero batch size, Select the pack-files by examining packs
from oldest-to-newest, computing the "expected size" by counting
the number of objects in the pack referenced by the
multi-pack-index, then divide by the total number of objects in
the pack and multiply by the pack size. We select packs with
expected size below the batch size until the set of packs have
multi-pack-index: repack batches below --batch-size The --batch-size=<size> option of 'git multi-pack-index repack' is intended to limit the amount of work done by the repack. In the case of a large repository, this command should repack a number of small pack-files but leave the large pack-files alone. Most often, the repository has one large pack-file from a 'git clone' operation and number of smaller pack-files from incremental 'git fetch' operations. The issue with '--batch-size' is that it also _prevents_ the repack from happening if the expected size of the resulting pack-file is too small. This was intended as a way to avoid frequent churn of small pack-files, but it has mostly caused confusion when a repository is of "medium" size. That is, not enormous like the Windows OS repository, but also not so small that this incremental repack isn't valuable. The solution presented here is to collect pack-files for repack if their expected size is smaller than the batch-size parameter until either the total expected size exceeds the batch-size or all pack-files are considered. If there are at least two pack-files, then these are combined to a new pack-file whose size should not be too much larger than the batch-size. This new strategy should succeed in keeping the number of pack-files small in these "medium" size repositories. The concern about churn is likely not interesting, as the real control over that is the frequency in which the repack command is run. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-11 15:30:18 +00:00
total expected size at least the batch size, or all pack-files
are considered. If only one pack-file is selected, then do
nothing. If a new pack-file is created, rewrite the
multi-pack-index to reference the new pack-file. A later run of
'git multi-pack-index expire' will delete the pack-files that
were part of this batch.
+
If `repack.packKeptObjects` is `false`, then any pack-files with an
associated `.keep` file will not be selected for the batch to repack.
multi-pack-index: prepare 'repack' subcommand In an environment where the multi-pack-index is useful, it is due to many pack-files and an inability to repack the object store into a single pack-file. However, it is likely that many of these pack-files are rather small, and could be repacked into a slightly larger pack-file without too much effort. It may also be important to ensure the object store is highly available and the repack operation does not interrupt concurrent git commands. Introduce a 'repack' subcommand to 'git multi-pack-index' that takes a '--batch-size' option. The subcommand will inspect the multi-pack-index for referenced pack-files whose size is smaller than the batch size, until collecting a list of pack-files whose sizes sum to larger than the batch size. Then, a new pack-file will be created containing the objects from those pack-files that are referenced by the multi-pack-index. The resulting pack is likely to actually be smaller than the batch size due to compression and the fact that there may be objects in the pack- files that have duplicate copies in other pack-files. The current change introduces the command-line arguments, and we add a test that ensures we parse these options properly. Since we specify a small batch size, we will guarantee that future implementations do not change the list of pack-files. In addition, we hard-code the modified times of the packs in the pack directory to ensure the list of packs sorted by modified time matches the order if sorted by size (ascending). This will be important in a future test. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-10 23:35:26 +00:00
EXAMPLES
--------
* Write a MIDX file for the packfiles in the current `.git` directory.
+
-----------------------------------------------
$ git multi-pack-index write
-----------------------------------------------
* Write a MIDX file for the packfiles in the current `.git` directory with a
corresponding bitmap.
+
-------------------------------------------------------------
$ git multi-pack-index write --preferred-pack=<pack> --bitmap
-------------------------------------------------------------
* Write a MIDX file for the packfiles in an alternate object store.
+
-----------------------------------------------
$ git multi-pack-index --object-dir <alt> write
-----------------------------------------------
* Verify the MIDX file for the packfiles in the current `.git` directory.
+
-----------------------------------------------
$ git multi-pack-index verify
-----------------------------------------------
SEE ALSO
--------
See link:technical/multi-pack-index.html[The Multi-Pack-Index Design
Document] and linkgit:gitformat-pack[5] for more information on the
multi-pack-index feature and its file format.
GIT
---
Part of the linkgit:git[1] suite