1
0
mirror of https://github.com/git/git synced 2024-07-05 00:58:49 +00:00
Commit Graph

557 Commits

Author SHA1 Message Date
Junio C Hamano
ffa47b75cf Merge branch 'tb/pseudo-merge-reachability-bitmap'
The pseudo-merge reachability bitmap to help more efficient storage
of the reachability bitmap in a repository with too many refs has
been added.

* tb/pseudo-merge-reachability-bitmap: (26 commits)
  pack-bitmap.c: ensure pseudo-merge offset reads are bounded
  Documentation/technical/bitmap-format.txt: add missing position table
  t/perf: implement performance tests for pseudo-merge bitmaps
  pseudo-merge: implement support for finding existing merges
  ewah: `bitmap_equals_ewah()`
  pack-bitmap: extra trace2 information
  pack-bitmap.c: use pseudo-merges during traversal
  t/test-lib-functions.sh: support `--notick` in `test_commit_bulk()`
  pack-bitmap: implement test helpers for pseudo-merge
  ewah: implement `ewah_bitmap_popcount()`
  pseudo-merge: implement support for reading pseudo-merge commits
  pack-bitmap.c: read pseudo-merge extension
  pseudo-merge: scaffolding for reads
  pack-bitmap: extract `read_bitmap()` function
  pack-bitmap-write.c: write pseudo-merge table
  pseudo-merge: implement support for selecting pseudo-merge commits
  config: introduce `git_config_double()`
  pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public
  pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()`
  pack-bitmap-write: support storing pseudo-merge commits
  ...
2024-06-24 16:39:13 -07:00
Junio C Hamano
988499e295 Merge branch 'ps/refs-without-the-repository-updates'
Further clean-up the refs subsystem to stop relying on
the_repository, and instead use the repository associated to the
ref_store object.

* ps/refs-without-the-repository-updates:
  refs/packed: remove references to `the_hash_algo`
  refs/files: remove references to `the_hash_algo`
  refs/files: use correct repository
  refs: remove `dwim_log()`
  refs: drop `git_default_branch_name()`
  refs: pass repo when peeling objects
  refs: move object peeling into "object.c"
  refs: pass ref store when detecting dangling symrefs
  refs: convert iteration over replace refs to accept ref store
  refs: retrieve worktree ref stores via associated repository
  refs: refactor `resolve_gitlink_ref()` to accept a repository
  refs: pass repo when retrieving submodule ref store
  refs: track ref stores via strmap
  refs: implement releasing ref storages
  refs: rename `init_db` callback to avoid confusion
  refs: adjust names for `init` and `init_db` callbacks
2024-05-30 14:15:13 -07:00
Junio C Hamano
ee8537ebc9 Merge branch 'tb/pack-bitmap-write-cleanups'
The pack bitmap code saw some clean-up to prepare for a follow-up topic.

* tb/pack-bitmap-write-cleanups:
  pack-bitmap: introduce `bitmap_writer_free()`
  pack-bitmap-write.c: avoid uninitialized 'write_as' field
  pack-bitmap: drop unused `max_bitmaps` parameter
  pack-bitmap: avoid use of static `bitmap_writer`
  pack-bitmap-write.c: move commit_positions into commit_pos fields
  object.h: add flags allocated by pack-bitmap.h
2024-05-28 11:17:07 -07:00
Taylor Blau
4722e06edc pack-bitmap: move some initialization to bitmap_writer_init()
The pack-bitmap-writer machinery uses a oidmap (backed by khash.h) to
map from commits selected for bitmaps (by OID) to a bitmapped_commit
structure (containing the bitmap itself, among other things like its XOR
offset, etc.)

This map was initialized at the end of `bitmap_writer_build()`. New
entries are added in `pack-bitmap-write.c::store_selected()`, which is
called by the bitmap_builder machinery (which is responsible for
traversing history and generating the actual bitmaps).

Reorganize when this field is initialized and when entries are added to
it so that we can quickly determine whether a commit is a candidate for
pseudo-merge selection, or not (since it was already selected to receive
a bitmap, and thus storing it in a pseudo-merge would be redundant).

The changes are as follows:

  - Introduce a new `bitmap_writer_init()` function which initializes
    the `writer.bitmaps` field (instead of waiting until the end of
    `bitmap_writer_build()`).

  - Add map entries in `push_bitmapped_commit()` (which is called via
    `bitmap_writer_select_commits()`) with OID keys and NULL values to
    track whether or not we *expect* to write a bitmap for some given
    commit.

  - Validate that a NULL entry is found matching the given key when we
    store a selected bitmap.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-24 11:40:41 -07:00
Patrick Steinhardt
30aaff437f refs: pass repo when peeling objects
Both `peel_object()` and `peel_iterated_oid()` implicitly rely on
`the_repository` to look up objects. Despite the fact that we want to
get rid of `the_repository`, it also leads to some restrictions in our
ref iterators when trying to retrieve the peeled value for a repository
other than `the_repository`.

Refactor these functions such that both take a repository as argument
and remove the now-unnecessary restrictions.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-17 10:33:39 -07:00
Taylor Blau
85f360fee5 pack-bitmap: introduce bitmap_writer_free()
Now that there is clearer memory ownership around the bitmap_writer
structure, introduce a bitmap_writer_free() function that callers may
use to free any memory associated with their instance of the
bitmap_writer structure.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-15 06:53:46 -07:00
Taylor Blau
9675b06917 pack-bitmap: drop unused max_bitmaps parameter
The `max_bitmaps` parameter in `bitmap_writer_select_commits()` was
introduced back in 7cc8f97108 (pack-objects: implement bitmap writing,
2013-12-21), making it original to the bitmap implementation in Git
itself.

When that patch was merged via 0f9e62e084 (Merge branch
'jk/pack-bitmap', 2014-02-27), its sole caller in builtin/pack-objects.c
passed a value of "-1" for `max_bitmaps`, indicating no limit.

Since then, the only other caller (in midx.c, added via c528e17966
(pack-bitmap: write multi-pack bitmaps, 2021-08-31)) also uses a value
of "-1" for `max_bitmaps`.

Since no callers have needed a finite limit for the `max_bitmaps`
parameter in the nearly decade that has passed since 0f9e62e084, let's
remove the parameter and any dead pieces of code connected to it.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-15 06:52:32 -07:00
Taylor Blau
07647c92ff pack-bitmap: avoid use of static bitmap_writer
The pack-bitmap machinery uses a structure called 'bitmap_writer' to
collect the data necessary to write out .bitmap files. Since its
introduction in 7cc8f97108 (pack-objects: implement bitmap writing,
2013-12-21), there has been a single static bitmap_writer structure,
which is responsible for all bitmap writing-related operations.

In practice, this is OK, since we are only ever writing a single .bitmap
file in a single process (e.g., `git multi-pack-index write --bitmap`,
`git pack-objects --write-bitmap-index`, `git repack -b`, etc.).

However, having a single static variable makes issues like data
ownership unclear, when to free variables, what has/hasn't been
initialized unclear.

Refactor this code to be written in terms of a given bitmap_writer
structure instead of relying on a static global.

Note that this exposes the structure definition of the bitmap_writer at
the pack-bitmap.h level. We could work around this by, e.g., forcing
callers to declare their writers as:

    struct bitmap_writer *writer;
    bitmap_writer_init(&bitmap_writer);

and then declaring `bitmap_writer_init()` as taking in a double-pointer
like so:

    void bitmap_writer_init(struct bitmap_writer **writer);

which would avoid us having to expose the definition of the structure
itself. This patch takes a different approach, since future patches
(like for the ongoing pseudo-merge bitmaps work) will want to modify the
innards of this structure (in the previous example, via pseudo-merge.c).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-15 06:52:32 -07:00
Patrick Steinhardt
2e5c4758b7 cocci: apply rules to rewrite callers of "refs" interfaces
Apply the rules that rewrite callers of "refs" interfaces to explicitly
pass `struct ref_store`. The resulting patch has been applied with the
`--whitespace=fix` option.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07 10:06:59 -07:00
Junio C Hamano
1002f28a52 Merge branch 'eb/hash-transition'
Work to support a repository that work with both SHA-1 and SHA-256
hash algorithms has started.

* eb/hash-transition: (30 commits)
  t1016-compatObjectFormat: add tests to verify the conversion between objects
  t1006: test oid compatibility with cat-file
  t1006: rename sha1 to oid
  test-lib: compute the compatibility hash so tests may use it
  builtin/ls-tree: let the oid determine the output algorithm
  object-file: handle compat objects in check_object_signature
  tree-walk: init_tree_desc take an oid to get the hash algorithm
  builtin/cat-file: let the oid determine the output algorithm
  rev-parse: add an --output-object-format parameter
  repository: implement extensions.compatObjectFormat
  object-file: update object_info_extended to reencode objects
  object-file-convert: convert commits that embed signed tags
  object-file-convert: convert commit objects when writing
  object-file-convert: don't leak when converting tag objects
  object-file-convert: convert tag objects when writing
  object-file-convert: add a function to convert trees between algorithms
  object: factor out parse_mode out of fast-import and tree-walk into in object.h
  cache: add a function to read an OID of a specific algorithm
  tag: sign both hashes
  commit: export add_header_signature to support handling signatures on tags
  ...
2024-03-28 14:13:50 -07:00
Taylor Blau
23c1e71369 pack-objects: enable multi-pack reuse via feature.experimental
Now that multi-pack reuse is supported, enable it via the
feature.experimental configuration in addition to the classic
`pack.allowPackReuse`.

This will allow more users to experiment with the new behavior who might
not otherwise be aware of the existing `pack.allowPackReuse`
configuration option.

The enum with values NO_PACK_REUSE, SINGLE_PACK_REUSE, and
MULTI_PACK_REUSE is defined statically in builtin/pack-objects.c's
compilation unit. We could hoist that enum into a scope visible from the
repository_settings struct, and then use that enum value in
pack-objects. Instead, define a single int that indicates what
pack-objects's default value should be to avoid additional unnecessary
code movement.

Though `feature.experimental` implies `pack.allowPackReuse=multi`, this
can still be overridden by explicitly setting the latter configuration
to either "single" or "false". Tests covering all of these cases are
showin t5332.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-05 15:27:01 -08:00
Junio C Hamano
0fea6b73f1 Merge branch 'tb/multi-pack-verbatim-reuse'
Streaming spans of packfile data used to be done only from a
single, primary, pack in a repository with multiple packfiles.  It
has been extended to allow reuse from other packfiles, too.

* tb/multi-pack-verbatim-reuse: (26 commits)
  t/perf: add performance tests for multi-pack reuse
  pack-bitmap: enable reuse from all bitmapped packs
  pack-objects: allow setting `pack.allowPackReuse` to "single"
  t/test-lib-functions.sh: implement `test_trace2_data` helper
  pack-objects: add tracing for various packfile metrics
  pack-bitmap: prepare to mark objects from multiple packs for reuse
  pack-revindex: implement `midx_pair_to_pack_pos()`
  pack-revindex: factor out `midx_key_to_pack_pos()` helper
  midx: implement `midx_preferred_pack()`
  git-compat-util.h: implement checked size_t to uint32_t conversion
  pack-objects: include number of packs reused in output
  pack-objects: prepare `write_reused_pack_verbatim()` for multi-pack reuse
  pack-objects: prepare `write_reused_pack()` for multi-pack reuse
  pack-objects: pass `bitmapped_pack`'s to pack-reuse functions
  pack-objects: keep track of `pack_start` for each reuse pack
  pack-objects: parameterize pack-reuse routines over a single pack
  pack-bitmap: return multiple packs via `reuse_partial_packfile_from_bitmap()`
  pack-bitmap: simplify `reuse_partial_packfile_from_bitmap()` signature
  ewah: implement `bitmap_is_empty()`
  pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions
  ...
2024-01-12 16:09:56 -08:00
Junio C Hamano
492ee03f60 Merge branch 'en/header-cleanup'
Remove unused header "#include".

* en/header-cleanup:
  treewide: remove unnecessary includes in source files
  treewide: add direct includes currently only pulled in transitively
  trace2/tr2_tls.h: remove unnecessary include
  submodule-config.h: remove unnecessary include
  pkt-line.h: remove unnecessary include
  line-log.h: remove unnecessary include
  http.h: remove unnecessary include
  fsmonitor--daemon.h: remove unnecessary includes
  blame.h: remove unnecessary includes
  archive.h: remove unnecessary include
  treewide: remove unnecessary includes in source files
  treewide: remove unnecessary includes from header files
2024-01-08 14:05:15 -08:00
Elijah Newren
eea0e59ffb treewide: remove unnecessary includes in source files
Each of these were checked with
   gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).

...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file.  These cases were:
  * builtin/credential-cache.c
  * builtin/pull.c
  * builtin/send-pack.c

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26 12:04:31 -08:00
Taylor Blau
af626ac0e0 pack-bitmap: enable reuse from all bitmapped packs
Now that both the pack-bitmap and pack-objects code are prepared to
handle marking and using objects from multiple bitmapped packs for
verbatim reuse, allow marking objects from all bitmapped packs as
eligible for reuse.

Within the `reuse_partial_packfile_from_bitmap()` function, we no longer
only mark the pack whose first object is at bit position zero for reuse,
and instead mark any pack contained in the MIDX as a reuse candidate.

Provide a handful of test cases in a new script (t5332) exercising
interesting behavior for multi-pack reuse to ensure that we performed
all of the previous steps correctly.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:38:09 -08:00
Taylor Blau
941074134c pack-objects: allow setting pack.allowPackReuse to "single"
In e704fc7978 (pack-objects: introduce pack.allowPackReuse, 2019-12-18),
the `pack.allowPackReuse` configuration option was introduced, allowing
users to disable the pack reuse mechanism.

To prepare for debugging multi-pack reuse, allow setting configuration
to "single" in addition to the usual bool-or-int values.

"single" implies the same behavior as "true", "1", "yes", and so on. But
it will complement a new "multi" value (to be introduced in a future
commit). When set to "single", we will only perform pack reuse on a
single pack, regardless of whether or not there are multiple MIDX'd
packs.

This requires no code changes (yet), since we only support single pack
reuse.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:38:09 -08:00
Taylor Blau
54393e4e68 pack-objects: add tracing for various packfile metrics
As part of the multi-pack reuse effort, we will want to add some tests
that assert that we reused a certain number of objects from a certain
number of packs.

We could do this by grepping through the stderr output of
`pack-objects`, but doing so would be brittle in case the output format
changed.

Instead, let's use the trace2 mechanism to log various pieces of
information about the generated packfile, which we can then use to
compare against desired values.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:38:09 -08:00
Taylor Blau
b96289a10b pack-objects: include number of packs reused in output
In addition to including the number of objects reused verbatim from a
reuse-pack, include the number of packs from which objects were reused.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:38:08 -08:00
Taylor Blau
ca0fd69e37 pack-objects: prepare write_reused_pack_verbatim() for multi-pack reuse
The function `write_reused_pack_verbatim()` within
`builtin/pack-objects.c` is responsible for writing out a continuous
set of objects beginning at the start of the reuse packfile.

In the existing implementation, we did something like:

    while (pos < reuse_packfile_bitmap->word_alloc &&
           reuse_packfile_bitmap->words[pos] == (eword_t)~0)
      pos++;

    if (pos)
      /* write first `pos * BITS_IN_WORD` objects from pack */

as an optimization to record a single chunk for the longest continuous
prefix of objects wanted out of the reuse pack, instead of having a
chunk for each individual object. For more details, see bb514de356
(pack-objects: improve partial packfile reuse, 2019-12-18).

In order to retain this optimization in a multi-pack reuse world, we can
no longer assume that the first object in a pack is on a word boundary
in the bitmap storing the set of reusable objects.

Assuming that all objects from the beginning of the reuse packfile up to
the object corresponding to the first bit on a word boundary are part of
the result, consume whole words at a time until the last whole word
belonging to the reuse packfile. Copy those objects to the resulting
packfile, and track that we reused them by recording a single chunk.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:38:08 -08:00
Taylor Blau
4805125710 pack-objects: prepare write_reused_pack() for multi-pack reuse
The function `write_reused_pack()` within `builtin/pack-objects.c` is
responsible for performing pack-reuse on a single pack, and has two main
functions:

  - it dispatches a call to `write_reused_pack_verbatim()` to see if we
    can reuse portions of the packfile in whole-word chunks

  - for any remaining objects (that is, any objects that appear after
    the first "gap" in the bitmap), call write_reused_pack_one() on that
    object to record it for reuse.

Prepare this function for multi-pack reuse by removing the assumption
that the bit position corresponding to the first object being reused
from a given pack must be at bit position zero.

The changes in this function are mostly straightforward. Initialize `i`
to the position of the first word to contain bits corresponding to that
reuse pack. In most situations, we throw the initialized value away,
since we end up replacing it with the return value from
write_reused_pack_verbatim(), moving us past the section of whole words
that we reused.

Likewise, modify the per-object loop to ignore any bits at the beginning
of the first word that do not belong to the pack currently being reused,
as well as skip to the "done" section once we have processed the last
bit corresponding to this pack.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:38:08 -08:00
Taylor Blau
073b40eba0 pack-objects: pass bitmapped_pack's to pack-reuse functions
Further prepare pack-objects to perform verbatim pack-reuse over
multiple packfiles by converting functions that take in a pointer to a
`struct packed_git` to instead take in a pointer to a `struct
bitmapped_pack`.

The additional information found in the bitmapped_pack struct (such as
the bit position corresponding to the beginning of the pack) will be
necessary in order to perform verbatim pack-reuse.

Note that we don't use any of the extra pieces of information contained
in the bitmapped_pack struct, so this step is merely preparatory and
does not introduce any functional changes.

Note further that we do not change the argument type to
write_reused_pack_one(). That function is responsible for copying
sections of the packfile directly and optionally patching any OFS_DELTAs
to account for not reusing sections of the packfile in between a delta
and its base.

As such, that function is (and should remain) oblivious to multi-pack
reuse, and does not require any of the extra pieces of information
stored in the bitmapped_pack struct.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:38:08 -08:00
Taylor Blau
d1d701eb9c pack-objects: keep track of pack_start for each reuse pack
When reusing objects from a pack, we keep track of a set of one or more
`reused_chunk`s, corresponding to sections of one or more object(s) from
a source pack that we are reusing. Each chunk contains two pieces of
information:

  - the offset of the first object in the source pack (relative to the
    beginning of the source pack)
  - the difference between that offset, and the corresponding offset in
    the pack we're generating

The purpose of keeping track of these is so that we can patch an
OFS_DELTAs that cross over a section of the reuse pack that we didn't
take.

For instance, consider a hypothetical pack as shown below:

                                                (chunk #2)
                                                __________...
                                               /
                                              /
      +--------+---------+-------------------+---------+
  ... | <base> | <other> |      (unused)     | <delta> | ...
      +--------+---------+-------------------+---------+
       \                /
        \______________/
           (chunk #1)

Suppose that we are sending objects "base", "other", and "delta", and
that the "delta" object is stored as an OFS_DELTA, and that its base is
"base". If we don't send any objects in the "(unused)" range, we can't
copy the delta'd object directly, since its delta offset includes a
range of the pack that we didn't copy, so we have to account for that
difference when patching and reassembling the delta.

In order to compute this value correctly, we need to know not only where
we are in the packfile we're assembling (with `hashfile_total(f)`) but
also the position of the first byte of the packfile that we are
currently reusing. Currently, this works just fine, since when reusing
only a single pack those two values are always identical (because
verbatim reuse is the first thing pack-objects does when enabled after
writing the pack header).

But when reusing multiple packs which have one or more gaps, we'll need
to account for these two values diverging.

Together, these two allow us to compute the reused chunk's offset
difference relative to the start of the reused pack, as desired.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:38:08 -08:00
Taylor Blau
5e29c3f707 pack-objects: parameterize pack-reuse routines over a single pack
The routines pack-objects uses to perform verbatim pack-reuse are:

  - write_reused_pack_one()
  - write_reused_pack_verbatim()
  - write_reused_pack()

, all of which assume that there is exactly one packfile being reused:
the global constant `reuse_packfile`.

Prepare for reusing objects from multiple packs by making reuse packfile
a parameter of each of the above functions in preparation for calling
these functions in a loop with multiple packfiles.

Note that we still have the global "reuse_packfile", but pass it through
each of the above function's parameter lists, eliminating all but one
direct access (the top-level caller in `write_pack_file()`). Even after
this series, we will still have a global, but it will hold the array of
reusable packfiles, and we'll pass them one at a time to these functions
in a loop.

Note also that we will eventually need to pass a `bitmapped_pack`
instead of a `packed_git` in order to hold onto additional information
required for reuse (such as the bit position of the first object
belonging to that pack). But that change will be made in a future commit
so as to minimize the noise below as much as possible.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:38:08 -08:00
Taylor Blau
83296d20e8 pack-bitmap: return multiple packs via reuse_partial_packfile_from_bitmap()
Further prepare for enabling verbatim pack-reuse over multiple packfiles
by changing the signature of reuse_partial_packfile_from_bitmap() to
populate an array of `struct bitmapped_pack *`'s instead of a pointer to
a single packfile.

Since the array we're filling out is sized dynamically[^1], add an
additional `size_t *` parameter which will hold the number of reusable
packs (equal to the number of elements in the array).

Note that since we still have not implemented true multi-pack reuse,
these changes aren't propagated out to the rest of the caller in
builtin/pack-objects.c.

In the interim state, we expect that the array has a single element, and
we use that element to fill out the static `reuse_packfile` variable
(which is a bog-standard `struct packed_git *`). Future commits will
continue to push this change further out through the pack-objects code.

[^1]: That is, even though we know the number of packs which are
  candidates for pack-reuse, we do not know how many of those
  candidates we can actually reuse.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:38:08 -08:00
Taylor Blau
35e156b9de pack-bitmap: simplify reuse_partial_packfile_from_bitmap() signature
The signature of `reuse_partial_packfile_from_bitmap()` currently takes
in a bitmap, as well as three output parameters (filled through
pointers, and passed as arguments), and also returns an integer result.

The output parameters are filled out with: (a) the packfile used for
pack-reuse, (b) the number of objects from that pack that we can reuse,
and (c) a bitmap indicating which objects we can reuse. The return value
is either -1 (when there are no objects to reuse), or 0 (when there is
at least one object to reuse).

Some of these parameters are redundant. Notably, we can infer from the
bitmap how many objects are reused by calling bitmap_popcount(). And we
can similar compute the return value based on that number as well.

As such, clean up the signature of this function to drop the "*entries"
parameter, as well as the int return value, since the single caller of
this function can infer these values themself.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:38:08 -08:00
Taylor Blau
66f0c71073 pack-objects: free packing_data in more places
The pack-objects internals use a packing_data struct to track what
objects are part of the pack(s) being formed.

Since these structures contain allocated fields, failing to
appropriately free() them results in a leak. Plug that leak by
introducing a clear_packing_data() function, and call it in the
appropriate spots.

This is a fairly straightforward leak to plug, since none of the callers
expect to read any values or have any references to parts of the address
space being freed.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-14 14:38:07 -08:00
Jeff King
ba176db511 config: handle NULL value when parsing non-bools
When the config parser sees an "implicit" bool like:

  [core]
  someVariable

it passes NULL to the config callback. Any callback code which expects a
string must check for NULL. This usually happens via helpers like
git_config_string(), etc, but some custom code forgets to do so and will
segfault.

These are all fairly vanilla cases where the solution is just the usual
pattern of:

  if (!value)
        return config_error_nonbool(var);

though note that in a few cases we have to split initializers like:

  int some_var = initializer();

into:

  int some_var;
  if (!value)
        return config_error_nonbool(var);
  some_var = initializer();

There are still some broken instances after this patch, which I'll
address on their own in individual patches after this one.

Reported-by: Carlos Andrés Ramírez Cataño <antaigroupltda@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-09 08:24:39 +09:00
Eric W. Biederman
efed687edc tree-walk: init_tree_desc take an oid to get the hash algorithm
To make it possible for git ls-tree to display the tree encoded
in the hash algorithm of the oid specified to git ls-tree, update
init_tree_desc to take as a parameter the oid of the tree object.

Update all callers of init_tree_desc and init_tree_desc_gently
to pass the oid of the tree object.

Use the oid of the tree object to discover the hash algorithm
of the oid and store that hash algorithm in struct tree_desc.

Use the hash algorithm in decode_tree_entry and
update_tree_entry_internal to handle reading a tree object encoded in
a hash algorithm that differs from the repositories hash algorithm.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 14:57:40 -07:00
Christian Couder
6cfcabfb9f pack-objects: allow --filter without --stdout
9535ce7337 (pack-objects: add list-objects filtering, 2017-11-21)
taught `git pack-objects` to use `--filter`, but required the use of
`--stdout` since a partial clone mechanism was not yet in place to
handle missing objects. Since then, changes like 9e27beaa23
(promisor-remote: implement promisor_remote_get_direct(), 2019-06-25)
and others added support to dynamically fetch objects that were missing.

Even without a promisor remote, filtering out objects can also be useful
if we can put the filtered out objects in a separate pack, and in this
case it also makes sense for pack-objects to write the packfile directly
to an actual file rather than on stdout.

Remove the `--stdout` requirement when using `--filter`, so that in a
follow-up commit, repack can pass `--filter` to pack-objects to omit
certain objects from the resulting packfile.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 14:54:29 -07:00
Junio C Hamano
c52a02a0f0 Merge branch 'jk/unused-post-2.42-part2'
Unused parameters to functions are marked as such, and/or removed,
in order to bring us closer to -Wunused-parameter clean.

* jk/unused-post-2.42-part2:
  parse-options: mark unused parameters in noop callback
  interpret-trailers: mark unused "unset" parameters in option callbacks
  parse-options: add more BUG_ON() annotations
  merge: do not pass unused opt->value parameter
  parse-options: mark unused "opt" parameter in callbacks
  parse-options: prefer opt->value to globals in callbacks
  checkout-index: delay automatic setting of to_tempfile
  format-patch: use OPT_STRING_LIST for to/cc options
  merge: simplify parsing of "-n" option
  merge: make xopts a strvec
2023-09-13 10:07:56 -07:00
Jeff King
34bf44f2d5 parse-options: mark unused "opt" parameter in callbacks
The previous commit argued that parse-options callbacks should try to
use opt->value rather than touching globals directly. In some cases,
however, that's awkward to do. Some callbacks touch multiple variables,
or may even just call into an abstracted function that does so.

In some of these cases we _could_ convert them by stuffing the multiple
variables into a single struct and passing the struct pointer through
opt->value. But that may make other parts of the code less readable,
as the struct relationship has to be mentioned everywhere.

Let's just accept that these cases are special and leave them as-is. But
we do need to mark their "opt" parameters to satisfy -Wunused-parameter.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-05 14:48:17 -07:00
Jeff King
66e3309294 parse-options: prefer opt->value to globals in callbacks
We have several parse-options callbacks that ignore their "opt"
parameters entirely. This is a little unusual, as we'd normally put the
result of the parsing into opt->value. In the case of these callbacks,
though, they directly manipulate global variables instead (and in
most cases the caller sets opt->value to NULL in the OPT_CALLBACK
declaration).

The immediate symptom we'd like to deal with is that the unused "opt"
variables trigger -Wunused-parameter. But how to fix that is debatable.
One option is to annotate them with UNUSED. But another is to have the
caller pass in the appropriate variable via opt->value, and use it. That
has the benefit of making the callbacks reusable (in theory at least),
and makes it clear from the OPT_CALLBACK declaration which variables
will be affected (doubly so for the cases in builtin/fast-export.c,
where we do set opt->value, but it is completely ignored!).

The slight downside is that we lose type safety, since they're now
passing through void pointers.

I went with the "just use them" approach here. The loss of type safety
is unfortunate, but that is already an issue with most of the other
callbacks. If we want to try to address that, we should do so more
consistently (and this patch would prepare these callbacks for whatever
we choose to do there).

Note that in the cases in builtin/fast-export.c, we are passing
anonymous enums. We'll have to give them names so that we can declare
the appropriate pointer type within the callbacks.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-09-05 14:48:17 -07:00
Taylor Blau
61568efa95 builtin/pack-objects.c: support --max-pack-size with --cruft
When pack-objects learned the `--cruft` option back in b757353676
(builtin/pack-objects.c: --cruft without expiration, 2022-05-20), we
explicitly forbade `--cruft` with `--max-pack-size`.

At the time, there was no specific rationale given in the patch for not
supporting the `--max-pack-size` option with `--cruft`. (As best I can
remember, it's because we were trying to push users towards only ever
having a single cruft pack, but I cannot be sure).

However, `--max-pack-size` is flexible enough that it already works with
`--cruft` and can shard unreachable objects across multiple cruft packs,
creating separate ".mtimes" files as appropriate. In fact, the
`--max-pack-size` option worked with `--cruft` as far back as
b757353676!

This is because we overwrite the `written_list`, and pass down the
appropriate length, i.e. the number of objects written in each pack
shard.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-29 11:58:06 -07:00
Taylor Blau
e741c07872 builtin/pack-objects.c: remove unnecessary strbuf_reset()
When reading input with the `--cruft` option, `git pack-objects` reads
each line into a strbuf, and then moves it to either the list of
discarded or fresh packs, depending on whether or not the input line
starts with a '-' character.

At the beginning of each loop iteration, the next line of input is read
with `strbuf_getline()`, which calls `strbuf_reset()` (as a part of
`strbuf_getwholeline()`) before reading the next line of input.

Thus, the call to `strbuf_reset()` (added back in b757353676
(builtin/pack-objects.c: --cruft without expiration, 2022-05-20)) at the
end of the loop is unnecessary, so let's remove it accordingly.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-29 10:26:44 -07:00
Junio C Hamano
ddcb8fd8b9 Merge branch 'rs/pack-objects-parseopt-fix'
Command line parser fix.

* rs/pack-objects-parseopt-fix:
  pack-objects: fix --no-quiet
  pack-objects: fix --no-keep-true-parents
2023-07-28 09:45:22 -07:00
René Scharfe
36f76d2a25 pack-objects: fix --no-quiet
Since 99fb6e04cb (pack-objects: convert to use parse_options(),
2012-02-01) git pack-objects has accepted the option --no-quiet, but it
does the same as --quiet.  That's because it's defined using OPT_SET_INT
with a value of 0, which sets 0 when negated, too.

Make --no-quiet equivalent to --progress and ignore it if --all-progress
was given.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-21 10:04:04 -07:00
René Scharfe
3a5f308741 pack-objects: fix --no-keep-true-parents
Since 99fb6e04cb (pack-objects: convert to use parse_options(),
2012-02-01) git pack-objects has accepted --no-keep-true-parents, but
this option does the same as --keep-true-parents.  That's because it's
defined using OPT_SET_INT with a value of 0, which sets 0 when negated
as well.

Turn --no-keep-true-parents into the opposite of --keep-true-parents by
using OPT_BOOL and storing the option's status directly in a variable
named "grafts_keep_true_parents" instead of in negative form in
"grafts_replace_parents".

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-21 10:02:59 -07:00
Junio C Hamano
ce481ac8b3 Merge branch 'cw/compat-util-header-cleanup'
Further shuffling of declarations across header files to streamline
file dependencies.

* cw/compat-util-header-cleanup:
  git-compat-util: move alloc macros to git-compat-util.h
  treewide: remove unnecessary includes for wrapper.h
  kwset: move translation table from ctype
  sane-ctype.h: create header for sane-ctype macros
  git-compat-util: move wrapper.c funcs to its header
  git-compat-util: move strbuf.c funcs to its header
2023-07-17 11:30:42 -07:00
Junio C Hamano
b3d1c85d48 Merge branch 'gc/config-context'
Reduce reliance on a global state in the config reading API.

* gc/config-context:
  config: pass source to config_parser_event_fn_t
  config: add kvi.path, use it to evaluate includes
  config.c: remove config_reader from configsets
  config: pass kvi to die_bad_number()
  trace2: plumb config kvi
  config.c: pass ctx with CLI config
  config: pass ctx with config files
  config.c: pass ctx in configsets
  config: add ctx arg to config_fn_t
  urlmatch.h: use config_fn_t type
  config: inline git_color_default_config
2023-07-06 11:54:48 -07:00
Calvin Wan
91c080dff5 git-compat-util: move alloc macros to git-compat-util.h
alloc_nr, ALLOC_GROW, and ALLOC_GROW_BY are commonly used macros for
dynamic array allocation. Moving these macros to git-compat-util.h with
the other alloc macros focuses alloc.[ch] to allocation for Git objects
and additionally allows us to remove inclusions to alloc.h from files
that solely used the above macros.

Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-05 11:42:31 -07:00
Calvin Wan
da9502ff4d treewide: remove unnecessary includes for wrapper.h
Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-05 11:41:59 -07:00
Junio C Hamano
a1264a08a1 Merge branch 'en/header-split-cache-h-part-3'
Header files cleanup.

* en/header-split-cache-h-part-3: (28 commits)
  fsmonitor-ll.h: split this header out of fsmonitor.h
  hash-ll, hashmap: move oidhash() to hash-ll
  object-store-ll.h: split this header out of object-store.h
  khash: name the structs that khash declares
  merge-ll: rename from ll-merge
  git-compat-util.h: remove unneccessary include of wildmatch.h
  builtin.h: remove unneccessary includes
  list-objects-filter-options.h: remove unneccessary include
  diff.h: remove unnecessary include of oidset.h
  repository: remove unnecessary include of path.h
  log-tree: replace include of revision.h with simple forward declaration
  cache.h: remove this no-longer-used header
  read-cache*.h: move declarations for read-cache.c functions from cache.h
  repository.h: move declaration of the_index from cache.h
  merge.h: move declarations for merge.c from cache.h
  diff.h: move declaration for global in diff.c from cache.h
  preload-index.h: move declarations for preload-index.c from elsewhere
  sparse-index.h: move declarations for sparse-index.c from cache.h
  name-hash.h: move declarations for name-hash.c from cache.h
  run-command.h: move declarations for run-command.c from cache.h
  ...
2023-06-29 16:43:21 -07:00
Glen Choo
8868b1ebfb config: pass kvi to die_bad_number()
Plumb "struct key_value_info" through all code paths that end in
die_bad_number(), which lets us remove the helper functions that read
analogous values from "struct config_reader". As a result, nothing reads
config_reader.config_kvi any more, so remove that too.

In config.c, this requires changing the signature of
git_configset_get_value() to 'return' "kvi" in an out parameter so that
git_configset_get_<type>() can pass it to git_config_<type>(). Only
numeric types will use "kvi", so for non-numeric types (e.g.
git_configset_get_string()), pass NULL to indicate that the out
parameter isn't needed.

Outside of config.c, config callbacks now need to pass "ctx->kvi" to any
of the git_config_<type>() functions that parse a config string into a
number type. Included is a .cocci patch to make that refactor.

The only exceptional case is builtin/config.c, where git_config_<type>()
is called outside of a config callback (namely, on user-provided input),
so config source information has never been available. In this case,
die_bad_number() defaults to a generic, but perfectly descriptive
message. Let's provide a safe, non-NULL for "kvi" anyway, but make sure
not to change the message.

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-28 14:06:40 -07:00
Glen Choo
a4e7e317f8 config: add ctx arg to config_fn_t
Add a new "const struct config_context *ctx" arg to config_fn_t to hold
additional information about the config iteration operation.
config_context has a "struct key_value_info kvi" member that holds
metadata about the config source being read (e.g. what kind of config
source it is, the filename, etc). In this series, we're only interested
in .kvi, so we could have just used "struct key_value_info" as an arg,
but config_context makes it possible to add/adjust members in the future
without changing the config_fn_t signature. We could also consider other
ways of organizing the args (e.g. moving the config name and value into
config_context or key_value_info), but in my experiments, the
incremental benefit doesn't justify the added complexity (e.g. a
config_fn_t will sometimes invoke another config_fn_t but with a
different config value).

In subsequent commits, the .kvi member will replace the global "struct
config_reader" in config.c, making config iteration a global-free
operation. It requires much more work for the machinery to provide
meaningful values of .kvi, so for now, merely change the signature and
call sites, pass NULL as a placeholder value, and don't rely on the arg
in any meaningful way.

Most of the changes are performed by
contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every
config_fn_t:

- Modifies the signature to accept "const struct config_context *ctx"
- Passes "ctx" to any inner config_fn_t, if needed
- Adds UNUSED attributes to "ctx", if needed

Most config_fn_t instances are easily identified by seeing if they are
called by the various config functions. Most of the remaining ones are
manually named in the .cocci patch. Manual cleanups are still needed,
but the majority of it is trivial; it's either adjusting config_fn_t
that the .cocci patch didn't catch, or adding forward declarations of
"struct config_context ctx" to make the signatures make sense.

The non-trivial changes are in cases where we are invoking a config_fn_t
outside of config machinery, and we now need to decide what value of
"ctx" to pass. These cases are:

- trace2/tr2_cfg.c:tr2_cfg_set_fl()

  This is indirectly called by git_config_set() so that the trace2
  machinery can notice the new config values and update its settings
  using the tr2 config parsing function, i.e. tr2_cfg_cb().

- builtin/checkout.c:checkout_main()

  This calls git_xmerge_config() as a shorthand for parsing a CLI arg.
  This might be worth refactoring away in the future, since
  git_xmerge_config() can call git_default_config(), which can do much
  more than just parsing.

Handle them by creating a KVI_INIT macro that initializes "struct
key_value_info" to a reasonable default, and use that to construct the
"ctx" arg.

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-28 14:06:39 -07:00
Junio C Hamano
d9f9f6b358 Merge branch 'ds/disable-replace-refs'
Introduce a mechanism to disable replace refs globally and per
repository.

* ds/disable-replace-refs:
  repository: create read_replace_refs setting
  replace-objects: create wrapper around setting
  repository: create disable_replace_refs()
2023-06-22 16:29:06 -07:00
Elijah Newren
a034e9106f object-store-ll.h: split this header out of object-store.h
The vast majority of files including object-store.h did not need dir.h
nor khash.h.  Split the header into two files, and let most just depend
upon object-store-ll.h, while letting the two callers that need it
depend on the full object-store.h.

After this patch:
    $ git grep -h include..object-store | sort | uniq -c
          2 #include "object-store.h"
        129 #include "object-store-ll.h"

Diff best viewed with `--color-moved`.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21 13:39:54 -07:00
Derrick Stolee
d24eda4e03 repository: create disable_replace_refs()
Several builtins depend on being able to disable the replace references
so we actually operate on each object individually. These currently do
so by directly mutating the 'read_replace_refs' global.

A future change will move this global into a different place, so it will
be necessary to change all of these lines. However, we can simplify that
transition by abstracting the purpose of these global assignments with a
method call.

We will need to keep this read_replace_refs global forever, as we want
to make sure that we never use replace refs throughout the life of the
process if this method is called. Future changes may present a
repository-scoped version of the variable to represent that repository's
core.useReplaceRefs config value, but a zero-valued read_replace_refs
will always override such a setting.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-12 13:34:55 -07:00
John Cai
44451a2e5e attr: teach "--attr-source=<tree>" global option to "git"
Earlier, 47cfc9bd (attr: add flag `--source` to work with tree-ish,
2023-01-14) taught "git check-attr" the "--source=<tree>" option to
allow it to read attribute files from a tree-ish, but did so only
for the command.  Just like "check-attr" users wanted a way to use
attributes from a tree-ish and not from the working tree files,
users of other commands (like "git diff") would benefit from the
same.

Undo most of the UI change the commit made, while keeping the
internal logic to read attributes from a given tree-ish. Expose the
internal logic via a new "--attr-source=<tree>" command line option
given to "git", so that it can be used with any git command that
runs as part of the main git process.

Additionally, add an environment variable GIT_ATTR_SOURCE that is set
when --attr-source is passed in, so that subprocesses use the same value
for the attributes source tree.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-06 14:34:09 -07:00
Junio C Hamano
849c8b3dbf Merge branch 'tb/pack-revindex-on-disk'
The on-disk reverse index that allows mapping from the pack offset
to the object name for the object stored at the offset has been
enabled by default.

* tb/pack-revindex-on-disk:
  t: invert `GIT_TEST_WRITE_REV_INDEX`
  config: enable `pack.writeReverseIndex` by default
  pack-revindex: introduce `pack.readReverseIndex`
  pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK
  pack-revindex: make `load_pack_revindex` take a repository
  t5325: mark as leak-free
  pack-write.c: plug a leak in stage_tmp_packfiles()
2023-04-27 16:00:59 -07:00
Junio C Hamano
36628c56ed Merge branch 'ps/fix-geom-repack-with-alternates'
Geometric repacking ("git repack --geometric=<n>") in a repository
that borrows from an alternate object database had various corner
case bugs, which have been corrected.

* ps/fix-geom-repack-with-alternates:
  repack: disable writing bitmaps when doing a local repack
  repack: honor `-l` when calculating pack geometry
  t/helper: allow chmtime to print verbosely without modifying mtime
  pack-objects: extend test coverage of `--stdin-packs` with alternates
  pack-objects: fix error when same packfile is included and excluded
  pack-objects: fix error when packing same pack twice
  pack-objects: split out `--stdin-packs` tests into separate file
  repack: fix generating multi-pack-index with only non-local packs
  repack: fix trying to use preferred pack in alternates
  midx: fix segfault with no packs and invalid preferred pack
2023-04-25 13:56:20 -07:00