Commit graph

11273 commits

Author SHA1 Message Date
Derrick Stolee b6551feadf merge-tree: load default git config
The 'git merge-tree' command handles creating root trees for merges
without using the worktree. This is a critical operation in many Git
hosts, as they typically store bare repositories.

This builtin does not load the default Git config, which can have
several important ramifications.

In particular, one config that is loaded by default is
core.useReplaceRefs. This is typically disabled in Git hosts due to
the ability to spoof commits in strange ways.

Since this config is not loaded specifically during merge-tree, users
were previously able to use refs/replace/ references to make pull
requests that looked valid but introduced malicious content. The
resulting merge commit would have the correct commit history, but the
malicious content would exist in the root tree of the merge.

The fix is simple: load the default Git config in cmd_merge_tree().
This may also fix other behaviors that are effected by reading default
config. The only possible downside is a little extra computation time
spent reading config. The config parsing is placed after basic argument
parsing so it does not slow down usage errors.

Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-10 12:20:44 -07:00
Patrick Steinhardt dd781e3856 fetch: introduce machine-parseable "porcelain" output format
The output of git-fetch(1) is obviously designed for consumption by
users, only: we neatly columnize data, we abbreviate reference names, we
print neat arrows and we don't provide information about actual object
IDs that have changed. This makes the output format basically unusable
in the context of scripted invocations of git-fetch(1) that want to
learn about the exact changes that the command performs.

Introduce a new machine-parseable "porcelain" output format that is
supposed to fix this shortcoming. This output format is intended to
provide information about every reference that is about to be updated,
the old object ID that the reference has been pointing to and the new
object ID it will be updated to. Furthermore, the output format provides
the same flags as the human-readable format to indicate basic conditions
for each reference update like whether it was a fast-forward update, a
branch deletion, a rejected update or others.

The output format is quite simple:

```
<flag> <old-object-id> <new-object-id> <local-reference>\n
```

We assume two conditions which are generally true:

    - The old and new object IDs have fixed known widths and cannot
      contain spaces.

    - References cannot contain newlines.

With these assumptions, the output format becomes unambiguously
parseable. Furthermore, given that this output is designed to be
consumed by scripts, the machine-readable data is printed to stdout
instead of stderr like the human-readable output is. This is mostly done
so that other data printed to stderr, like error messages or progress
meters, don't interfere with the parseable data.

A notable ommission here is that the output format does not include the
remote from which a reference was fetched, which might be important
information especially in the context of multi-remote fetches. But as
such a format would require us to print the remote for every single
reference update due to parallelizable fetches it feels wasteful for the
most likely usecase, which is when fetching from a single remote.

In a similar spirit, a second restriction is that this cannot be used
with `--recurse-submodules`. This is because any reference updates would
be ambiguous without also printing the repository in which the update
happens.

Considering that both multi-remote and submodule fetches are user-facing
features, using them in conjunction with `--porcelain` that is intended
for scripting purposes is likely not going to be useful in the majority
of cases. With that in mind these restrictions feel acceptable. If
usecases for either of these come up in the future though it is easy
enough to add a new "porcelain-v2" format that adds this information.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-10 10:35:25 -07:00
Patrick Steinhardt cdc034a0ac fetch: move option related variables into main function
The options of git-fetch(1) which we pass to `parse_options()` are
declared globally in `builtin/fetch.c`. This means we're forced to use
global variables for all the options, which is more likely to cause
confusion than explicitly passing state around.

Refactor the code to move the options into `cmd_fetch()`. Move variables
that were previously forced to be declared globally and which are only
used by `cmd_fetch()` into function-local scope.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-10 10:35:25 -07:00
Patrick Steinhardt 58afbe885c fetch: lift up parsing of "fetch.output" config variable
Parsing the display format happens inside of `display_state_init()`. As
we only need to check for a simple config entry, this is a natural
location to put this code as it means that display-state logic is neatly
contained in a single location.

We're about to introduce a new "porcelain" output format though that is
intended to be parseable by machines, for example inside of a script.
This format can be enabled by passing the `--porcelain` switch to
git-fetch(1). As a consequence, we'll have to add a second callsite that
influences the output format, which will become awkward to handle.

Refactor the code such that callers are expected to pass the display
format that is to be used into `display_state_init()`. This allows us to
lift up the code into the main function, where we can then hook it into
command line options parser in a follow-up commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-10 10:35:25 -07:00
Patrick Steinhardt 50957937f9 fetch: introduce display_format enum
We currently have two different display formats in git-fetch(1) with the
"full" and "compact" formats. This is tracked with a boolean value that
simply denotes whether the display format is supposed to be compacted
or not. This works reasonably well while there are only two formats, but
we're about to introduce another format that will make this a bit more
awkward to use.

Introduce a `enum display_format` that is more readily extensible.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-10 10:35:25 -07:00
Patrick Steinhardt 9539638a2b fetch: refactor calculation of the display table width
When displaying reference updates, we try to print the references in a
neat table. As the table's width is determined its contents we thus need
to precalculate the overall width before we can start printing updated
references.

The calculation is driven by `display_state_init()`, which invokes
`refcol_width()` for every reference that is to be printed. This split
is somewhat confusing. For one, we filter references that shall be
attributed to the overall width in both places. And second, we
needlessly recalculate the maximum line length based on the terminal
columns and display format for every reference.

Refactor the code so that the complete width calculations are neatly
contained in `refcol_width()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-10 10:35:25 -07:00
Patrick Steinhardt 1c31764dda fetch: print left-hand side when fetching HEAD:foo
`store_updated_refs()` parses the remote reference for two purposes:

    - It gets used as a note when writing FETCH_HEAD.

    - It is passed through to `display_ref_update()` to display
      updated references in the following format:

      ```
       * branch               master          -> master
      ```

In most cases, the parsed remote reference is the prettified reference
name and can thus be used for both cases. But if the remote reference is
HEAD, the parsed remote reference becomes empty. This is intended when
we write the FETCH_HEAD, where we skip writing the note in that case.
But when displaying the updated references this leads to inconsistent
output where the left-hand side of reference updates is missing in some
cases:

```
$ git fetch origin HEAD HEAD:explicit-head :implicit-head main
From https://github.com/git/git
 * branch                  HEAD       -> FETCH_HEAD
 * [new ref]                          -> explicit-head
 * [new ref]                          -> implicit-head
 * branch                  main       -> FETCH_HEAD
```

This behaviour has existed ever since the table-based output has been
introduced for git-fetch(1) via 165f390250 (git-fetch: more terse fetch
output, 2007-11-03) and was never explicitly documented either in the
commit message or in any of our tests. So while it may not be a bug per
se, it feels like a weird inconsistency and not like it was a concious
design decision.

The logic of how we compute the remote reference name that we ultimately
pass to `display_ref_update()` is not easy to follow. There are three
different cases here:

    - When the remote reference name is "HEAD" we set the remote
      reference name to the empty string. This is the case that causes
      the left-hand side to go missing, where we would indeed want to
      print "HEAD" instead of the empty string. This is what
      `prettify_refname()` would return.

    - When the remote reference name has a well-known prefix then we
      strip this prefix. This matches what `prettify_refname()` does.

    - Otherwise, we keep the fully qualified reference name. This also
      matches what `prettify_refname()` does.

As the return value of `prettify_refname()` would do the correct thing
for us in all three cases, we can thus fix the inconsistency by passing
through the full remote reference name to `display_ref_update()`, which
learns to call `prettify_refname()`. At the same time, this also
simplifies the code a bit.

Note that this patch also changes formatting of the block that computes
the "kind" (which is the category like "branch" or "tag") and "what"
(which is the prettified reference name like "master" or "v1.0")
variables. This is done on purpose so that it is part of the diff,
hopefully making the change easier to comprehend.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-10 10:35:25 -07:00
Patrick Steinhardt 5667141e3b fetch: fix --no-recurse-submodules with multi-remote fetches
When running `git fetch --no-recurse-submodules`, the exectation is that
we don't fetch any submodules. And while this works for fetches of a
single remote, it doesn't when fetching multiple remotes at once. The
result is that we do recurse into submodules even though the user has
explicitly asked us not to.

This is because while we pass on `--recurse-submodules={yes,on-demand}`
if specified by the user, we don't pass on `--no-recurse-submodules` to
the subprocess spawned to perform the submodule fetch.

Fix this by also forwarding this flag as expected.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-10 10:35:24 -07:00
Junio C Hamano 2ca91d1ee0 Merge branch 'mh/credential-oauth-refresh-token'
The credential subsystem learns to help OAuth framework.

* mh/credential-oauth-refresh-token:
  credential: new attribute oauth_refresh_token
2023-05-10 10:23:29 -07:00
Junio C Hamano 461eea3fb8 Merge branch 'ob/messages-capitalize-exception'
Message update.

* ob/messages-capitalize-exception:
  messages: capitalization and punctuation exceptions
2023-05-09 16:45:46 -07:00
Junio C Hamano ccd12a3d6c Merge branch 'en/header-split-cache-h-part-2'
More header clean-up.

* en/header-split-cache-h-part-2: (22 commits)
  reftable: ensure git-compat-util.h is the first (indirect) include
  diff.h: reduce unnecessary includes
  object-store.h: reduce unnecessary includes
  commit.h: reduce unnecessary includes
  fsmonitor: reduce includes of cache.h
  cache.h: remove unnecessary headers
  treewide: remove cache.h inclusion due to previous changes
  cache,tree: move basic name compare functions from read-cache to tree
  cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c
  hash-ll.h: split out of hash.h to remove dependency on repository.h
  tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h
  dir.h: move DTYPE defines from cache.h
  versioncmp.h: move declarations for versioncmp.c functions from cache.h
  ws.h: move declarations for ws.c functions from cache.h
  match-trees.h: move declarations for match-trees.c functions from cache.h
  pkt-line.h: move declarations for pkt-line.c functions from cache.h
  base85.h: move declarations for base85.c functions from cache.h
  copy.h: move declarations for copy.c functions from cache.h
  server-info.h: move declarations for server-info.c functions from cache.h
  packfile.h: move pack_window and pack_entry from cache.h
  ...
2023-05-09 16:45:46 -07:00
Shuqi Liang 8c30be9176 diff-files: integrate with sparse index
Remove full index requirement for `git diff-files`. Refactor the
ensure_expanded and ensure_not_expanded functions by introducing a
common helper function, ensure_index_state. Add test to ensure the index
is no expanded in `git diff-files`.

The `p2000` tests demonstrate a ~96% execution time reduction for 'git
diff-files' and a ~97% execution time reduction for 'git diff-files'
for a file using a sparse index:

Test                                               before  after
-----------------------------------------------------------------------
2000.94: git diff-files (full-v3)                  0.09    0.08 -11.1%
2000.95: git diff-files (full-v4)                  0.09    0.09 +0.0%
2000.96: git diff-files (sparse-v3)                0.52    0.02 -96.2%
2000.97: git diff-files (sparse-v4)                0.51    0.02 -96.1%
2000.98: git diff-files -- f2/f4/a (full-v3)       0.06    0.07 +16.7%
2000.99: git diff-files -- f2/f4/a (full-v4)       0.08    0.08 +0.0%
2000.100: git diff-files -- f2/f4/a (sparse-v3)    0.46    0.01 -97.8%
2000.101: git diff-files -- f2/f4/a (sparse-v4)    0.51    0.02 -96.1%

Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-09 14:26:36 -07:00
Teng Long 425b4d7f47 push: introduce '--branches' option
The '--all' option of git-push built-in cmd support to push all branches
(refs under refs/heads) to remote. Under the usage, a user can easlily
work in some scenarios, for example, branches synchronization and batch
upload.

The '--all' was introduced for a long time, meanwhile, git supports to
customize the storage location under "refs/". when a new git user see
the usage like, 'git push origin --all', we might feel like we're
pushing _all_ the refs instead of just branches without looking at the
documents until we found the related description of it or '--mirror'.

To ensure compatibility, we cannot rename '--all' to another name
directly, one way is, we can try to add a new option '--heads' which be
identical with the functionality of '--all' to let the user understand
the meaning of representation more clearly. Actually, We've more or less
named options this way already, for example, in 'git-show-ref' and 'git
ls-remote'.

At the same time, we fix a related issue about the wrong help
information of '--all' option in code and add some test cases in
t5523, t5543 and t5583.

Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-06 14:36:43 -07:00
John Cai 44451a2e5e attr: teach "--attr-source=<tree>" global option to "git"
Earlier, 47cfc9bd (attr: add flag `--source` to work with tree-ish,
2023-01-14) taught "git check-attr" the "--source=<tree>" option to
allow it to read attribute files from a tree-ish, but did so only
for the command.  Just like "check-attr" users wanted a way to use
attributes from a tree-ish and not from the working tree files,
users of other commands (like "git diff") would benefit from the
same.

Undo most of the UI change the commit made, while keeping the
internal logic to read attributes from a given tree-ish. Expose the
internal logic via a new "--attr-source=<tree>" command line option
given to "git", so that it can be used with any git command that
runs as part of the main git process.

Additionally, add an environment variable GIT_ATTR_SOURCE that is set
when --attr-source is passed in, so that subprocesses use the same value
for the attributes source tree.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-06 14:34:09 -07:00
John Cai 9019d7dceb name-rev: make --stdin hidden
In 34ae3b70 (name-rev: deprecate --stdin in favor of --annotate-stdin),
we renamed --stdin to --annotate-stdin for the sake of a clearer name
for the option, and added text that indicates --stdin is deprecated. The
next step is to hide --stdin completely.

Make the option hidden. Also, update documentation to remove all
mentions of --stdin.

Signed-off-by: "John Cai" <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-06 14:32:20 -07:00
Junio C Hamano d699e27bd4 Merge branch 'tb/ban-strtok'
Mark strtok() and strtok_r() to be banned.

* tb/ban-strtok:
  banned.h: mark `strtok()` and `strtok_r()` as banned
  t/helper/test-json-writer.c: avoid using `strtok()`
  t/helper/test-oidmap.c: avoid using `strtok()`
  t/helper/test-hashmap.c: avoid using `strtok()`
  string-list: introduce `string_list_setlen()`
  string-list: multi-delimiter `string_list_split_in_place()`
2023-05-02 10:13:35 -07:00
Derrick Stolee cf9cd8b55c fsck: use local repository
In 0d30feef3c (fsck: create scaffolding for rev-index checks,
2023-04-17) and later 5a6072f631 (fsck: validate .rev file header,
2023-04-17), the check_pack_rev_indexes() method was created with a
'struct repository *r' parameter. However, this parameter was unused and
instead 'the_repository' was used in its place.

Fix this situation with the obvious replacement.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-02 08:48:23 -07:00
Derrick Stolee 756f1bcd29 fsck: verify checksums of all .bitmap files
If a filesystem-level corruption occurs in a .bitmap file, Git can react
poorly. This could take the form of a run-time error due to failing to
parse an EWAH bitmap or be more subtle such as returning the wrong set
of objects to a fetch or clone.

A natural first response to either of these kinds of errors is to run
'git fsck' to see if any files are corrupt. This currently ignores all
.bitmap files.

Add checks to 'git fsck' for all .bitmap files that are currently
associated with a multi-pack-index or pack file. Verify their checksums
using the hashfile API.

We iterate through all multi-pack-indexes and pack-files to be sure to
check all .bitmap files, not just the one that would be read by the
process. For example, a multi-pack-index bitmap overrules a pack-bitmap.
However, if the multi-pack-index is removed, the pack-bitmap may be
selected instead. Be thorough to include every file that could become
active in such a way. This includes checking files in alternates.

There is potential that we could extend this effort to check the
structure of the reachability bitmaps themselves, but it is very
expensive to do so. At minimum, it's as expensive as generating the
bitmaps in the first place, and that's assuming that we don't use the
trivial algorithm of verifying each bitmap individually. The trivial
algorithm will result in quadratic behavior (number of objects times
number of bitmapped commits) while the bitmap building operation
constructs a lattice of commits to build bitmaps incrementally and then
generate the final bitmaps from a subset of those commits.

If we were to extend 'git fsck' to check .bitmap file contents more
closely like this, then we would likely want to hide it behind an option
that signals the user is more willing to do expensive operations such as
this.

For testing, set up a repository with a pack-bitmap _and_ a
multi-pack-index bitmap. This requires some file movement to avoid
deleting the pack-bitmap during the repack that creates the
multi-pack-index bitmap. We can then verify that 'git fsck' is checking
all files, not just the "active" bitmap.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-05-02 08:48:22 -07:00
Junio C Hamano fc23c397c7 Merge branch 'tb/enable-cruft-packs-by-default'
When "gc" needs to retain unreachable objects, packing them into
cruft packs (instead of exploding them into loose object files) has
been offered as a more efficient option for some time.  Now the use
of cruft packs has been made the default and no longer considered
an experimental feature.

* tb/enable-cruft-packs-by-default:
  repository.h: drop unused `gc_cruft_packs`
  builtin/gc.c: make `gc.cruftPacks` enabled by default
  t/t9300-fast-import.sh: prepare for `gc --cruft` by default
  t/t6500-gc.sh: add additional test cases
  t/t6500-gc.sh: refactor cruft pack tests
  t/t6501-freshen-objects.sh: prepare for `gc --cruft` by default
  t/t5304-prune.sh: prepare for `gc --cruft` by default
  builtin/gc.c: ignore cruft packs with `--keep-largest-pack`
  builtin/repack.c: fix incorrect reference to '-C'
  pack-write.c: plug a leak in stage_tmp_packfiles()
2023-04-28 16:03:03 -07:00
Oswald Buddenhagen b734fe49fd messages: capitalization and punctuation exceptions
These are conscious violations of the usual rules for error messages,
based on this reasoning:

 - If an error message is directly followed by another sentence, it
   needs to be properly terminated with a period, lest the grammar
   looks broken and becomes hard to read.

 - That second sentence isn't actually an error message any more, so
   it should abide to conventional language rules for good looks and
   legibility. Arguably, these should be converted to advice
   messages (which the user can squelch, too), but that's a much
   bigger effort to get right.

 - Neither of these apply to the first hunk in do_exec(), but this
   two-line message looks just too much like a real sentence to not
   terminate it. Also, leaving it alone would make it asymmetrical
   to the other hunk.

Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-28 12:06:27 -07:00
Junio C Hamano a02675ad90 Merge branch 'ds/fsck-pack-revindex'
"git fsck" learned to validate the on-disk pack reverse index files.

* ds/fsck-pack-revindex:
  fsck: validate .rev file header
  fsck: check rev-index position values
  fsck: check rev-index checksums
  fsck: create scaffolding for rev-index checks
2023-04-27 16:00:59 -07:00
Junio C Hamano 849c8b3dbf Merge branch 'tb/pack-revindex-on-disk'
The on-disk reverse index that allows mapping from the pack offset
to the object name for the object stored at the offset has been
enabled by default.

* tb/pack-revindex-on-disk:
  t: invert `GIT_TEST_WRITE_REV_INDEX`
  config: enable `pack.writeReverseIndex` by default
  pack-revindex: introduce `pack.readReverseIndex`
  pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK
  pack-revindex: make `load_pack_revindex` take a repository
  t5325: mark as leak-free
  pack-write.c: plug a leak in stage_tmp_packfiles()
2023-04-27 16:00:59 -07:00
Junio C Hamano 36628c56ed Merge branch 'ps/fix-geom-repack-with-alternates'
Geometric repacking ("git repack --geometric=<n>") in a repository
that borrows from an alternate object database had various corner
case bugs, which have been corrected.

* ps/fix-geom-repack-with-alternates:
  repack: disable writing bitmaps when doing a local repack
  repack: honor `-l` when calculating pack geometry
  t/helper: allow chmtime to print verbosely without modifying mtime
  pack-objects: extend test coverage of `--stdin-packs` with alternates
  pack-objects: fix error when same packfile is included and excluded
  pack-objects: fix error when packing same pack twice
  pack-objects: split out `--stdin-packs` tests into separate file
  repack: fix generating multi-pack-index with only non-local packs
  repack: fix trying to use preferred pack in alternates
  midx: fix segfault with no packs and invalid preferred pack
2023-04-25 13:56:20 -07:00
Junio C Hamano 80d268f309 Merge branch 'jk/protocol-cap-parse-fix'
The code to parse capability list for v0 on-wire protocol fell into
an infinite loop when a capability appears multiple times, which
has been corrected.

* jk/protocol-cap-parse-fix:
  v0 protocol: use size_t for capability length/offset
  t5512: test "ls-remote --heads --symref" filtering with v0 and v2
  t5512: allow any protocol version for filtered symref test
  t5512: add v2 support for "ls-remote --symref" test
  v0 protocol: fix sha1/sha256 confusion for capabilities^{}
  t5512: stop referring to "v1" protocol
  v0 protocol: fix infinite loop when parsing multi-valued capabilities
2023-04-25 13:56:20 -07:00
Junio C Hamano 0807e57807 Merge branch 'en/header-split-cache-h'
Header clean-up.

* en/header-split-cache-h: (24 commits)
  protocol.h: move definition of DEFAULT_GIT_PORT from cache.h
  mailmap, quote: move declarations of global vars to correct unit
  treewide: reduce includes of cache.h in other headers
  treewide: remove double forward declaration of read_in_full
  cache.h: remove unnecessary includes
  treewide: remove cache.h inclusion due to pager.h changes
  pager.h: move declarations for pager.c functions from cache.h
  treewide: remove cache.h inclusion due to editor.h changes
  editor: move editor-related functions and declarations into common file
  treewide: remove cache.h inclusion due to object.h changes
  object.h: move some inline functions and defines from cache.h
  treewide: remove cache.h inclusion due to object-file.h changes
  object-file.h: move declarations for object-file.c functions from cache.h
  treewide: remove cache.h inclusion due to git-zlib changes
  git-zlib: move declarations for git-zlib functions from cache.h
  treewide: remove cache.h inclusion due to object-name.h changes
  object-name.h: move declarations for object-name.c functions from cache.h
  treewide: remove unnecessary cache.h inclusion
  treewide: be explicit about dependence on mem-pool.h
  treewide: be explicit about dependence on oid-array.h
  ...
2023-04-25 13:56:20 -07:00
Taylor Blau 52acddf36c string-list: multi-delimiter string_list_split_in_place()
Enhance `string_list_split_in_place()` to accept multiple characters as
delimiters instead of a single character.

Instead of using `strchr(2)` to locate the first occurrence of the given
delimiter character, `string_list_split_in_place_multi()` uses
`strcspn(2)` to move past the initial segment of characters comprised of
any characters in the delimiting set.

When only a single delimiting character is provided, `strpbrk(2)` (which
is implemented with `strcspn(2)`) has equivalent performance to
`strchr(2)`. Modern `strcspn(2)` implementations treat an empty
delimiter or the singleton delimiter as a special case and fall back to
calling strchrnul(). Both glibc[1] and musl[2] implement `strcspn(2)`
this way.

This change is one step to removing `strtok(2)` from the tree. Note that
`string_list_split_in_place()` is not a strict replacement for
`strtok()`, since it will happily turn sequential delimiter characters
into empty entries in the resulting string_list. For example:

    string_list_split_in_place(&xs, "foo:;:bar:;:baz", ":;", -1)

would yield a string list of:

    ["foo", "", "", "bar", "", "", "baz"]

Callers that wish to emulate the behavior of strtok(2) more directly
should call `string_list_remove_empty_items()` after splitting.

To avoid regressions for the new multi-character delimter cases, update
t0063 in this patch as well.

[1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcspn.c;hb=glibc-2.37#l35
[2]: https://git.musl-libc.org/cgit/musl/tree/src/string/strcspn.c?h=v1.2.3#n11

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 16:01:28 -07:00
Elijah Newren d4a4f9291d commit.h: reduce unnecessary includes
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 12:47:33 -07:00
Elijah Newren d1cbe1e6d8 hash-ll.h: split out of hash.h to remove dependency on repository.h
hash.h depends upon and includes repository.h, due to the definition and
use of the_hash_algo (defined as the_repository->hash_algo).  However,
most headers trying to include hash.h are only interested in the layout
of the structs like object_id.  Move the parts of hash.h that do not
depend upon repository.h into a new file hash-ll.h (the "low level"
parts of hash.h), and adjust other files to use this new header where
the convenience inline functions aren't needed.

This allows hash.h and object.h to be fairly small, minimal headers.  It
also exposes a lot of hidden dependencies on both path.h (which was
brought in by repository.h) and repository.h (which was previously
implicitly brought in by object.h), so also adjust other files to be
more explicit about what they depend upon.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 12:47:32 -07:00
Elijah Newren b388633c5c pkt-line.h: move declarations for pkt-line.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 12:47:32 -07:00
Elijah Newren d5fff46f40 copy.h: move declarations for copy.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 12:47:31 -07:00
Elijah Newren 623b80bef2 server-info.h: move declarations for server-info.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 12:47:31 -07:00
Elijah Newren cb2a51356d symlinks.h: move declarations for symlinks.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 12:47:31 -07:00
Junio C Hamano b64894c206 Merge branch 'ow/ref-filter-omit-empty'
"git branch --format=..." and "git format-patch --format=..."
learns "--omit-empty" to hide refs that whose formatting result
becomes an empty string from the output.

* ow/ref-filter-omit-empty:
  branch, for-each-ref, tag: add option to omit empty lines
2023-04-21 15:35:05 -07:00
Junio C Hamano 7ac228c994 Merge branch 'rn/sparse-describe'
"git describe --dirty" learns to work better with sparse-index.

* rn/sparse-describe:
  describe: enable sparse index for describe
2023-04-21 15:35:04 -07:00
M Hickford a5c76569e7 credential: new attribute oauth_refresh_token
Git authentication with OAuth access token is supported by every popular
Git host including GitHub, GitLab and BitBucket [1][2][3]. Credential
helpers Git Credential Manager (GCM) and git-credential-oauth generate
OAuth credentials [4][5]. Following RFC 6749, the application prints a
link for the user to authorize access in browser. A loopback redirect
communicates the response including access token to the application.

For security, RFC 6749 recommends that OAuth response also includes
expiry date and refresh token [6]. After expiry, applications can use
the refresh token to generate a new access token without user
reauthorization in browser. GitLab and BitBucket set the expiry at two
hours [2][3]. (GitHub doesn't populate expiry or refresh token.)

However the Git credential protocol has no attribute to store the OAuth
refresh token (unrecognised attributes are silently discarded). This
means that the user has to regularly reauthorize the helper in browser.
On a browserless system, this is particularly intrusive, requiring a
second device.

Introduce a new attribute oauth_refresh_token. This is especially
useful when a storage helper and a read-only OAuth helper are configured
together. Recall that `credential fill` calls each helper until it has a
non-expired password.

```
[credential]
	helper = storage  # eg. cache or osxkeychain
	helper = oauth
```

The OAuth helper can use the stored refresh token forwarded by
`credential fill` to generate a fresh access token without opening the
browser. See
https://github.com/hickford/git-credential-oauth/pull/3/files
for an implementation tested with this patch.

Add support for the new attribute to credential-cache. Eventually, I
hope to see support in other popular storage helpers.

Alternatives considered: ask helpers to store all unrecognised
attributes. This seems excessively complex for no obvious gain.
Helpers would also need extra information to distinguish between
confidential and non-confidential attributes.

Workarounds: GCM abuses the helper get/store/erase contract to store the
refresh token during credential *get* as the password for a fictitious
host [7] (I wrote this hack). This workaround is only feasible for a
monolithic helper with its own storage.

[1] https://github.blog/2012-09-21-easier-builds-and-deployments-using-git-over-https-and-oauth/
[2] https://docs.gitlab.com/ee/api/oauth2.html#access-git-over-https-with-access-token
[3] https://support.atlassian.com/bitbucket-cloud/docs/use-oauth-on-bitbucket-cloud/#Cloning-a-repository-with-an-access-token
[4] https://github.com/GitCredentialManager/git-credential-manager
[5] https://github.com/hickford/git-credential-oauth
[6] https://datatracker.ietf.org/doc/html/rfc6749#section-5.1
[7] 66b94e489a/src/shared/GitLab/GitLabHostProvider.cs (L207)

Signed-off-by: M Hickford <mirth.hickford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-21 09:38:30 -07:00
Junio C Hamano a4a4db8cf7 Merge branch 'gc/better-error-when-local-clone-fails-with-symlink'
"git clone --local" stops copying from an original repository that
has symbolic links inside its $GIT_DIR; an error message when that
happens has been updated.

* gc/better-error-when-local-clone-fails-with-symlink:
  clone: error specifically with --local and symlinked objects
2023-04-20 14:33:36 -07:00
Junio C Hamano 08bd076ce4 Merge branch 'rs/get-tar-commit-id-use-defined-const'
Code clean-up to replace a hardcoded constant with a CPP macro.

* rs/get-tar-commit-id-use-defined-const:
  get-tar-commit-id: use TYPEFLAG_GLOBAL_HEADER instead of magic value
2023-04-20 14:33:36 -07:00
Taylor Blau e3e24de1bf builtin/gc.c: make gc.cruftPacks enabled by default
Back in 5b92477f89 (builtin/gc.c: conditionally avoid pruning objects
via loose, 2022-05-20), `git gc` learned the `--cruft` option and
`gc.cruftPacks` configuration to opt-in to writing cruft packs when
collecting or pruning unreachable objects.

Cruft packs were introduced with the merge in a50036da1a (Merge branch
'tb/cruft-packs', 2022-06-03). They address the problem of "loose object
explosions", where Git will write out many individual loose objects when
there is a large number of unreachable objects that have not yet aged
past `--prune=<date>`.

Instead of keeping track of those unreachable yet recent objects via
their loose object file's mtime, cruft packs collect all unreachable
objects into a single pack with a corresponding `*.mtimes` file that
acts as a table to store the mtimes of all unreachable objects. This
prevents the need to store unreachable objects as loose as they age out
of the repository, and avoids the problem of loose object explosions.

Beyond avoiding loose object explosions, cruft packs also act as a more
efficient mechanism to store unreachable objects as they age out of a
repository. This is because pairs of similar unreachable objects serve
as delta bases for one another.

In 5b92477f89, the feature was introduced as experimental. Since then,
GitHub has been running these patches in every repository generating
hundreds of millions of cruft packs along the way. The feature is
battle-tested, and avoids many pathological cases such as above. Users
who either run `git gc` manually, or via `git maintenance` can benefit
from having cruft packs.

As such, enable cruft pack generation to take place by default (by
making `gc.cruftPacks` have the default of "true" rather than "false).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-18 14:56:48 -07:00
Taylor Blau 05b9013b71 builtin/gc.c: ignore cruft packs with --keep-largest-pack
When cruft packs were implemented, we never adjusted the code for `git
gc`'s `--keep-largest-pack` and `gc.bigPackThreshold` to ignore cruft
packs. This option and configuration option share a common
implementation, but including cruft packs is wrong in both cases:

  - Running `git gc --keep-largest-pack` in a repository where the
    largest pack is the cruft pack itself will make it impossible for
    `git gc` to prune objects, since the cruft pack itself is kept.

  - The same is true for `gc.bigPackThreshold`, if the size of the cruft
    pack exceeds the limit set by the caller.

In the future, it is possible that `gc.bigPackThreshold` could be used
to write a separate cruft pack containing any new unreachable objects
that entered the repository since the last time a cruft pack was
written.

There are some complexities to doing so, mainly around handling
pruning objects that are in an existing cruft pack that is above the
threshold (which would either need to be rewritten, or else delay
pruning). Rewriting a substantially similar cruft pack isn't ideal, but
it is significantly better than the status-quo.

If users have large cruft packs that they don't want to rewrite, they
can mark them as `*.keep` packs. But in general, if a repository has a
cruft pack that is so large it is slowing down GC's, it should probably
be pruned anyway.

In the meantime, ignore cruft packs in the common implementation for
both of these options, and add a pair of tests to prevent any future
regressions here.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-18 14:56:47 -07:00
Taylor Blau c512f31109 builtin/repack.c: fix incorrect reference to '-C'
When cruft packs were originally being developed, `-C` was designated as
the short-form for `--cruft` (as in `git repack -C`).

This was dropped due to confusion with Git's top-level `-C` option
before submitting to the list. But the reference to it in
`--cruft-expiration`'s help text was never updated. Fix that dangling
reference in this patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-18 14:56:47 -07:00
Junio C Hamano 3c957e6d39 Merge branch 'pw/rebase-cleanup-merge-strategy-option-handling'
Clean-up of the code path that deals with merge strategy option
handling in "git rebase".

* pw/rebase-cleanup-merge-strategy-option-handling:
  rebase: remove a couple of redundant strategy tests
  rebase -m: fix serialization of strategy options
  rebase -m: cleanup --strategy-option handling
  sequencer: use struct strvec to store merge strategy options
  rebase: stop reading and writing unnecessary strategy state
2023-04-17 18:05:13 -07:00
Junio C Hamano 66bf8f1943 Merge branch 'cm/branch-delete-error-message-update'
"git branch -d origin/master" would say "no such branch", but it is
likely a missed "-r" if refs/remotes/origin/master exists.  The
command has been taught to give such a hint in its error message.

* cm/branch-delete-error-message-update:
  branch: improve error log on branch not found by checking remotes refs
2023-04-17 18:05:12 -07:00
Junio C Hamano 9d8370d445 Merge branch 'tk/mergetool-gui-default-config'
"git mergetool" and "git difftool" learns a new configuration
guiDefault to optionally favor configured guitool over non-gui-tool
automatically when $DISPLAY is set.

* tk/mergetool-gui-default-config:
  mergetool: new config guiDefault supports auto-toggling gui by DISPLAY
2023-04-17 18:05:11 -07:00
Junio C Hamano d47ee0a565 Merge branch 'sl/sparse-write-tree'
"git write-tree" learns to work better with sparse-index.

* sl/sparse-write-tree:
  write-tree: integrate with sparse index
2023-04-17 18:05:11 -07:00
Derrick Stolee 5a6072f631 fsck: validate .rev file header
While parsing a .rev file, we check the header information to be sure it
makes sense. This happens before doing any additional validation such as
a checksum or value check. In order to differentiate between a bad
header and a non-existent file, we need to update the API for loading a
reverse index.

Make load_pack_revindex_from_disk() non-static and specify that a
positive value means "the file does not exist" while other errors during
parsing are negative values. Since an invalid header prevents setting up
the structures we would use for further validations, we can stop at that
point.

The place where we can distinguish between a missing file and a corrupt
file is inside load_revindex_from_disk(), which is used both by pack
rev-indexes and multi-pack-index rev-indexes. Some tests in t5326
demonstrate that it is critical to take some conditions to allow
positive error signals.

Add tests that check the three header values.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-17 14:39:05 -07:00
Derrick Stolee 0d30feef3c fsck: create scaffolding for rev-index checks
The 'fsck' builtin checks many of Git's on-disk data structures, but
does not currently validate the pack rev-index files (a .rev file to
pair with a .pack and .idx file).

Before doing a more-involved check process, create the scaffolding
within builtin/fsck.c to have a new error type and add that error type
when the API method verify_pack_revindex() returns an error. That method
does nothing currently, but we will add checks to it in later changes.

For now, check that 'git fsck' succeeds without any errors in the normal
case. Future checks will be paired with tests that corrupt the .rev file
appropriately.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-17 14:39:04 -07:00
Jeff King 7ce4c8f752 v0 protocol: use size_t for capability length/offset
When parsing server capabilities, we use "int" to store lengths and
offsets. At first glance this seems like a spot where our parser may be
confused by integer overflow if somebody sent us a malicious response.

In practice these strings are all bounded by the 64k limit of a
pkt-line, so using "int" is OK. However, it makes the code simpler to
audit if they just use size_t everywhere. Note that because we take
these parameters as pointers, this also forces many callers to update
their declared types.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 15:08:13 -07:00
Patrick Steinhardt d85cd18777 repack: disable writing bitmaps when doing a local repack
In order to write a bitmap, we need to have full coverage of all objects
that are about to be packed. In the traditional non-multi-pack-index
world this meant we need to do a full repack of all objects into a
single packfile. But in the new multi-pack-index world we can get away
with writing bitmaps when we have multiple packfiles as long as the
multi-pack-index covers all objects.

This is not always the case though. When asked to perform a repack of
local objects, only, then we cannot guarantee to have full coverage of
all objects regardless of whether we do a full repack or a repack with a
multi-pack-index. The end result is that writing the bitmap will fail in
both worlds:

    $ git multi-pack-index write --stdin-packs --bitmap <packfiles
    warning: Failed to write bitmap index. Packfile doesn't have full closure (object 1529341d78cf45377407369acb0f4ff2b5cdae42 is missing)
    error: could not write multi-pack bitmap

Now there are two different ways to fix this. The first one would be to
amend git-multi-pack-index(1) to disable writing bitmaps when we notice
that we don't have full object coverage.

    - We don't have enough information in git-multi-pack-index(1) in
      order to tell whether the local repository _should_ have full
      coverage. Because even when connected to an alternate object
      directory, it may be the case that we still have all objects
      around in the main object database.

    - git-multi-pack-index(1) is quite a low-level tool. Automatically
      disabling functionality that it was asked to provide does not feel
      like the right thing to do.

We can easily fix it at a higher level in git-repack(1) though. When
asked to only include local objects via `-l` and when connected to an
alternate object directory then we will override the user's ask and
disable writing bitmaps with a warning. This is similar to what we do in
git-pack-objects(1), where we also disable writing bitmaps in case we
omit an object from the pack.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 10:27:52 -07:00
Patrick Steinhardt 932c16c04b repack: honor -l when calculating pack geometry
When the user passes `-l` to git-repack(1), then they essentially ask us
to only repack objects part of the local object database while ignoring
any packfiles part of an alternate object database. And we in fact honor
this bit when doing a geometric repack as the resulting packfile will
only ever contain local objects.

What we're missing though is that we don't take locality of packfiles
into account when computing whether the geometric sequence is intact or
not. So even though we would only ever roll up local packfiles anyway,
we could end up trying to repack because of non-local packfiles. This
does not make much sense, and in the worst case it can cause us to try
and do the geometric repack over and over again because we're never able
to restore the geometric sequence.

Fix this bug by honoring whether the user has passed `-l`. If so, we
skip adding any non-local packfiles to the pack geometry.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 10:27:52 -07:00
Patrick Steinhardt 752b465c3c pack-objects: fix error when same packfile is included and excluded
When passing the same packfile both as included and excluded via the
`--stdin-packs` option, then we will return an error because the
excluded packfile cannot be found. This is because we will only set the
`util` pointer for the included packfile list if it was found, so that
we later die when we notice that it's in fact not set for the excluded
packfile list.

Fix this bug by always setting the `util` pointer for both the included
and excluded list entries.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 10:27:51 -07:00
Patrick Steinhardt 732194b5f2 pack-objects: fix error when packing same pack twice
When passed the same packfile twice via `--stdin-packs` we return an
error that the packfile supposedly was not found. This is because when
reading packs into the list of included or excluded packfiles, we will
happily re-add packfiles even if they are part of the lists already. And
while the list can now contain duplicates, we will only set the `util`
pointer of the first list entry to the `packed_git` structure. We notice
that at a later point when checking that all list entries have their
`util` pointer set and die with an error.

While this is kind of a nonsensical request, this scenario can be hit
when doing geometric repacks. When a repository is connected to an
alternate object directory and both have the exact same packfile then
both would get added to the geometric sequence. And when we then decide
to perform the repack, we will invoke git-pack-objects(1) with the same
packfile twice.

Fix this bug by removing any duplicates from both the included and
excluded packs.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 10:27:51 -07:00
Patrick Steinhardt 51861340f8 repack: fix generating multi-pack-index with only non-local packs
When writing the multi-pack-index with geometric repacking we will add
all packfiles to the index that are part of the geometric sequence. This
can potentially also include packfiles borrowed from an alternate object
directory. But given that a multi-pack-index can only ever include packs
that are part of the main object database this does not make much sense
whatsoever.

In the edge case where all packfiles are contained in the alternate
object database and the local repository has none itself this bug can
cause us to invoke git-multi-pack-index(1) with only non-local packfiles
that it ultimately cannot find. This causes it to return an error and
thus causes the geometric repack to fail.

Fix the code to skip non-local packfiles.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 10:27:51 -07:00
Patrick Steinhardt 3d74a2337c repack: fix trying to use preferred pack in alternates
When doing a geometric repack with multi-pack-indices, then we ask
git-multi-pack-index(1) to use the largest packfile as the preferred
pack. It can happen though that the largest packfile is not part of the
main object database, but instead part of an alternate object database.
The result is that git-multi-pack-index(1) will not be able to find the
preferred pack and print a warning. It then falls back to use the first
packfile that the multi-pack-index shall reference.

Fix this bug by only considering packfiles as preferred pack that are
local. This is the right thing to do given that a multi-pack-index
should never reference packfiles borrowed from an alternate.

While at it, rename the function `get_largest_active_packfile()` to
`get_preferred_pack()` to better document its intent.

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-14 10:27:51 -07:00
Øystein Walle aabfdc9514 branch, for-each-ref, tag: add option to omit empty lines
If the given format string expands to the empty string, a newline is
still printed. This makes using the output linewise more tedious. For
example, git update-ref --stdin does not accept empty lines.

Add options to "git branch", "git for-each-ref", and "git tag" to
not print these empty lines.  The default behavior remains the same.

Signed-off-by: Øystein Walle <oystwa@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-13 08:07:45 -07:00
Taylor Blau 9f7f10a282 t: invert GIT_TEST_WRITE_REV_INDEX
Back in e8c58f894b (t: support GIT_TEST_WRITE_REV_INDEX, 2021-01-25), we
added a test knob to conditionally enable writing a ".rev" file when
indexing a pack. At the time, this was used to ensure that the test
suite worked even when ".rev" files were written, which served as a
stress-test for the on-disk reverse index implementation.

Now that reading from on-disk ".rev" files is enabled by default, the
test knob `GIT_TEST_WRITE_REV_INDEX` no longer has any meaning.

We could get rid of the option entirely, but there would be no
convenient way to test Git when ".rev" files *aren't* in place.

Instead of getting rid of the option, invert its meaning to instead
disable writing ".rev" files, thereby running the test suite in a mode
where the reverse index is generated from scratch.

This ensures that, when GIT_TEST_NO_WRITE_REV_INDEX is set to some
spelling of "true", we are still running and exercising Git's behavior
when forced to generate reverse indexes from scratch. Do so by setting
it in the linux-TEST-vars CI run to ensure that we are maintaining good
coverage of this now-legacy code.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-13 07:55:46 -07:00
Taylor Blau a8dd7e05b1 config: enable pack.writeReverseIndex by default
Back in e37d0b8730 (builtin/index-pack.c: write reverse indexes,
2021-01-25), Git learned how to read and write a pack's reverse index
from a file instead of in-memory.

A pack's reverse index is a mapping from pack position (that is, the
order that objects appear together in a ".pack")  to their position in
lexical order (that is, the order that objects are listed in an ".idx"
file).

Reverse indexes are consulted often during pack-objects, as well as
during auxiliary operations that require mapping between pack offsets,
pack order, and index index.

They are useful in GitHub's infrastructure, where we have seen a
dramatic increase in performance when writing ".rev" files[1]. In
particular:

  - an ~80% reduction in the time it takes to serve fetches on a popular
    repository, Homebrew/homebrew-core.

  - a ~60% reduction in the peak memory usage to serve fetches on that
    same repository.

  - a collective savings of ~35% in CPU time across all pack-objects
    invocations serving fetches across all repositories in a single
    datacenter.

Reverse indexes are also beneficial to end-users as well as forges. For
example, the time it takes to generate a pack containing the objects for
the 10 most recent commits in linux.git (representing a typical push) is
significantly faster when on-disk reverse indexes are available:

    $ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~10 } >in
    $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
    Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null
      Time (mean ± σ):     543.0 ms ±  20.3 ms    [User: 616.2 ms, System: 58.8 ms]
      Range (min … max):   521.0 ms … 577.9 ms    10 runs

    Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null
      Time (mean ± σ):     245.0 ms ±  11.4 ms    [User: 335.6 ms, System: 31.3 ms]
      Range (min … max):   226.0 ms … 259.6 ms    13 runs

    Summary
      'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran
	2.22 ± 0.13 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null'

The same is true of writing a pack containing the objects for the 30
most-recent commits:

    $ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~30 } >in
    $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
    Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null
      Time (mean ± σ):     866.5 ms ±  16.2 ms    [User: 1414.5 ms, System: 97.0 ms]
      Range (min … max):   839.3 ms … 886.9 ms    10 runs

    Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null
      Time (mean ± σ):     581.6 ms ±  10.2 ms    [User: 1181.7 ms, System: 62.6 ms]
      Range (min … max):   567.5 ms … 599.3 ms    10 runs

    Summary
      'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran
	1.49 ± 0.04 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null'

...and savings on trivial operations like computing the on-disk size of
a single (packed) object are even more dramatic:

    $ git rev-parse HEAD >in
    $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" <in'
    Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in
      Time (mean ± σ):     305.8 ms ±  11.4 ms    [User: 264.2 ms, System: 41.4 ms]
      Range (min … max):   290.3 ms … 331.1 ms    10 runs

    Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in
      Time (mean ± σ):       4.0 ms ±   0.3 ms    [User: 1.7 ms, System: 2.3 ms]
      Range (min … max):     1.6 ms …   4.6 ms    1155 runs

    Summary
      'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in' ran
       76.96 ± 6.25 times faster than 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in'

In the more than two years since e37d0b8730 was merged, Git's
implementation of on-disk reverse indexes has been thoroughly tested,
both from users enabling `pack.writeReverseIndexes`, and from GitHub's
deployment of the feature. The latter has been running without incident
for more than two years.

This patch changes Git's behavior to write on-disk reverse indexes by
default when indexing a pack, which should make the above operations
faster for everybody's Git installation after a repack.

(The previous commit explains some potential drawbacks of using on-disk
reverse indexes in certain limited circumstances, that essentially boil
down to a trade-off between time to generate, and time to access. For
those limited cases, the `pack.readReverseIndex` escape hatch can be
used).

[1]: https://github.blog/2021-04-29-scaling-monorepo-maintenance/#reverse-indexes

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-13 07:55:46 -07:00
Junio C Hamano 96f4113ac0 Merge branch 'jc/clone-object-format-from-void'
"git clone" from an empty repository learned to propagate the
choice of the hash algorithm from the source repository to the
newly created repository.

* jc/clone-object-format-from-void:
  clone: propagate object-format when cloning from void
2023-04-11 13:49:13 -07:00
Junio C Hamano d02343b599 Merge branch 'ws/sparse-check-rules'
"git sparse-checkout" command learns a debugging aid for the sparse
rule definitions.

* ws/sparse-check-rules:
  builtin/sparse-checkout: add check-rules command
  builtin/sparse-checkout: remove NEED_WORK_TREE flag
2023-04-11 13:49:12 -07:00
Elijah Newren 65156bb7ec treewide: remove double forward declaration of read_in_full
cache.h's nature of a dumping ground of includes prevented it from
being included in some compat/ files, forcing us into a workaround
of having a double forward declaration of the read_in_full() function
(see commit 14086b0a13 ("compat/pread.c: Add a forward declaration to
fix a warning", 2007-11-17)).  Now that we have moved functions like
read_in_full() from cache.h to wrapper.h, and wrapper.h isn't littered
with unrelated and scary #defines, get rid of the extra forward
declaration and just have compat/pread.c include wrapper.h.

Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:11 -07:00
Elijah Newren ca4eed708d pager.h: move declarations for pager.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:10 -07:00
Elijah Newren 4e120823a3 editor: move editor-related functions and declarations into common file
cache.h and strbuf.[ch] had editor-related functions.  Move these into
editor.[ch].

Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:10 -07:00
Elijah Newren 87bed17907 object-file.h: move declarations for object-file.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:10 -07:00
Elijah Newren d88dbaa718 git-zlib: move declarations for git-zlib functions from cache.h
Move functions from cache.h for zlib.c into a new header file.  Since
adding a "zlib.h" would cause issues with the real zlib, rename zlib.c
to git-zlib.c while we are at it.

Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:10 -07:00
Elijah Newren dabab1d6e6 object-name.h: move declarations for object-name.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:09 -07:00
Elijah Newren 5bc07225e5 treewide: be explicit about dependence on mem-pool.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:09 -07:00
Elijah Newren 6f2d743043 treewide: be explicit about dependence on oid-array.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:09 -07:00
Elijah Newren 75f273d9b7 treewide: be explicit about dependence on pack-revindex.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:09 -07:00
Elijah Newren 73359a9b43 treewide: be explicit about dependence on convert.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:09 -07:00
Elijah Newren 6c6ddf92d5 treewide: be explicit about dependence on advice.h
Dozens of files made use of advice functions, without explicitly
including advice.h.  This made it more difficult to find which files
could remove a dependence on cache.h.  Make C files explicitly include
advice.h if they are using it.

Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:09 -07:00
Elijah Newren 74ea5c9574 treewide: be explicit about dependence on trace.h & trace2.h
Dozens of files made use of trace and trace2 functions, without
explicitly including trace.h or trace2.h.  This made it more difficult
to find which files could remove a dependence on cache.h.  Make C files
explicitly include trace.h or trace2.h if they are using them.

Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:08 -07:00
Glen Choo 4e33535ea9 clone: error specifically with --local and symlinked objects
6f054f9fb3 (builtin/clone.c: disallow --local clones with
symlinks, 2022-07-28) gives a good error message when "git clone
--local" fails when the repo to clone has symlinks in
"$GIT_DIR/objects". In bffc762f87 (dir-iterator: prevent top-level
symlinks without FOLLOW_SYMLINKS, 2023-01-24), we later extended this
restriction to the case where "$GIT_DIR/objects" is itself a symlink,
but we didn't update the error message then - bffc762f87's tests show
that we print a generic "failed to start iterator over" message.

This is exacerbated by the fact that Documentation/git-clone.txt
mentions neither restriction, so users are left wondering if this is
intentional behavior or not.

Fix this by adding a check to builtin/clone.c: when doing a local clone,
perform an extra check to see if "$GIT_DIR/objects" is a symlink, and if
so, assume that that was the reason for the failure and report the
relevant information. Ideally, dir_iterator_begin() would tell us that
the real failure reason is the presence of the symlink, but (as far as I
can tell) there isn't an appropriate errno value for that.

Also, update Documentation/git-clone.txt to reflect that this
restriction exists.

Signed-off-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:46:09 -07:00
Phillip Wood 4a8bc9860a rebase -m: cleanup --strategy-option handling
When handling "--strategy-option" rebase collects the commands into a
struct string_list, then concatenates them into a string, prepending "--"
to each one before splitting the string and removing the "--" prefix.
This is an artifact of the scripted rebase and the need to support
"rebase --preserve-merges". Now that "--preserve-merges" no-longer
exists we can cleanup the way the argument is handled.

The tests for a bad strategy option are adjusted now that
parse_strategy_opts() is no-longer called when starting a rebase. The
fact that it only errors out when running "git rebase --continue" is a
mixed blessing but the next commit will fix the root cause of the
parsing problem so lets not worry about that here.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-10 09:53:19 -07:00
Phillip Wood fb60b9f37f sequencer: use struct strvec to store merge strategy options
The sequencer stores the merge strategy options in an array of strings
which allocated with ALLOC_GROW(). Using "struct strvec" avoids manually
managing the memory of that array and simplifies the code.

Aside from memory allocation the changes to the sequencer are largely
mechanical, changing xopts_nr to xopts.nr and xopts[i] to xopts.v[i]. A
new option parsing macro OPT_STRVEC() is also added to collect the
strategy options.  Hopefully this can be used to simplify the code in
builtin/merge.c in the future.

Note that there is a change of behavior to "git cherry-pick" and "git
revert" as passing “--no-strategy-option” will now clear any previous
strategy options whereas before this change it did nothing.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-10 09:53:19 -07:00
Phillip Wood 461434a013 rebase: stop reading and writing unnecessary strategy state
The state files for "--strategy" and "--strategy-option" are written and
read twice, once by builtin/rebase.c and then by sequencer.c. This is an
artifact of the scripted rebase and the need to support "rebase
--preserve-merges". Now that "--preserve-merges" no-longer exists we
only need to read and write these files in sequencer.c. This enables us
to remove a call to free() in read_strategy_opts() that was added by
f1f4ebf432 (sequencer.c: fix "opts->strategy" leak in
read_strategy_opts(), 2022-11-08) as this commit fixes the root cause of
that leak.

There is further scope for removing duplication in the reading and
writing of state files between builtin/rebase.c and sequencer.c but that
is left for a follow up series.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-10 09:53:19 -07:00
René Scharfe c870de6502 get-tar-commit-id: use TYPEFLAG_GLOBAL_HEADER instead of magic value
Use the same macro in the archive reader code as on the writer side in
archive-tar.c to document the connection.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-10 09:22:34 -07:00
Junio C Hamano 89833fc249 Merge branch 'ds/fetch-bundle-uri-with-all'
"git fetch --all" does not have to download and handle the same
bundleURI over and over, which has been corrected.

* ds/fetch-bundle-uri-with-all:
  fetch: download bundles once, even with --all
2023-04-06 13:38:32 -07:00
Junio C Hamano 6047b28eb7 Merge branch 'en/header-split-cleanup'
Split key function and data structure definitions out of cache.h to
new header files and adjust the users.

* en/header-split-cleanup:
  csum-file.h: remove unnecessary inclusion of cache.h
  write-or-die.h: move declarations for write-or-die.c functions from cache.h
  treewide: remove cache.h inclusion due to setup.h changes
  setup.h: move declarations for setup.c functions from cache.h
  treewide: remove cache.h inclusion due to environment.h changes
  environment.h: move declarations for environment.c functions from cache.h
  treewide: remove unnecessary includes of cache.h
  wrapper.h: move declarations for wrapper.c functions from cache.h
  path.h: move function declarations for path.c functions from cache.h
  cache.h: remove expand_user_path()
  abspath.h: move absolute path functions from cache.h
  environment: move comment_line_char from cache.h
  treewide: remove unnecessary cache.h inclusion from several sources
  treewide: remove unnecessary inclusion of gettext.h
  treewide: be explicit about dependence on gettext.h
  treewide: remove unnecessary cache.h inclusion from a few headers
2023-04-06 13:38:31 -07:00
Junio C Hamano 72871b198f Merge branch 'ab/remove-implicit-use-of-the-repository'
Code clean-up around the use of the_repository.

* ab/remove-implicit-use-of-the-repository:
  libs: use "struct repository *" argument, not "the_repository"
  post-cocci: adjust comments for recent repo_* migration
  cocci: apply the "revision.h" part of "the_repository.pending"
  cocci: apply the "rerere.h" part of "the_repository.pending"
  cocci: apply the "refs.h" part of "the_repository.pending"
  cocci: apply the "promisor-remote.h" part of "the_repository.pending"
  cocci: apply the "packfile.h" part of "the_repository.pending"
  cocci: apply the "pretty.h" part of "the_repository.pending"
  cocci: apply the "object-store.h" part of "the_repository.pending"
  cocci: apply the "diff.h" part of "the_repository.pending"
  cocci: apply the "commit.h" part of "the_repository.pending"
  cocci: apply the "commit-reach.h" part of "the_repository.pending"
  cocci: apply the "cache.h" part of "the_repository.pending"
  cocci: add missing "the_repository" macros to "pending"
  cocci: sort "the_repository" rules by header
  cocci: fix incorrect & verbose "the_repository" rules
  cocci: remove dead rule from "the_repository.pending.cocci"
2023-04-06 13:38:30 -07:00
Junio C Hamano 87daf40750 Merge branch 'ab/config-multi-and-nonbool'
Assorted config API updates.

* ab/config-multi-and-nonbool:
  for-each-repo: with bad config, don't conflate <path> and <cmd>
  config API: add "string" version of *_value_multi(), fix segfaults
  config API users: test for *_get_value_multi() segfaults
  for-each-repo: error on bad --config
  config API: have *_multi() return an "int" and take a "dest"
  versioncmp.c: refactor config reading next commit
  config API: add and use a "git_config_get()" family of functions
  config tests: add "NULL" tests for *_get_value_multi()
  config tests: cover blind spots in git_die_config() tests
2023-04-06 13:38:29 -07:00
Junio C Hamano e9dffbc7f1 Merge branch 'ps/fetch-ref-update-reporting'
Clean-up of the code path that reports what "git fetch" did to each
ref.

* ps/fetch-ref-update-reporting:
  fetch: centralize printing of reference updates
  fetch: centralize logic to print remote URL
  fetch: centralize handling of per-reference format
  fetch: pass the full local reference name to `format_display`
  fetch: move output format into `display_state`
  fetch: move reference width calculation into `display_state`
2023-04-06 13:38:28 -07:00
Junio C Hamano 955abf5f72 Merge branch 'jk/unused-post-2.40-part2'
Code clean-up for "-Wunused-parameter" build.

* jk/unused-post-2.40-part2:
  parse-options: drop parse_opt_unknown_cb()
  t/helper: mark unused argv/argc arguments
  mark "argv" as unused when we check argc
  builtins: mark unused prefix parameters
  builtins: annotate always-empty prefix parameters
  builtins: always pass prefix to parse_options()
  fast-import: fix file access when run from subdir
2023-04-06 13:38:28 -07:00
Junio C Hamano 7727da99df Merge branch 'ds/ahead-behind'
"git for-each-ref" learns '%(ahead-behind:<base>)' that computes the
distances from a single reference point in the history with bunch
of commits in bulk.

* ds/ahead-behind:
  commit-reach: add tips_reachable_from_bases()
  for-each-ref: add ahead-behind format atom
  commit-reach: implement ahead_behind() logic
  commit-graph: introduce `ensure_generations_valid()`
  commit-graph: return generation from memory
  commit-graph: simplify compute_generation_numbers()
  commit-graph: refactor compute_topological_levels()
  for-each-ref: explicitly test no matches
  for-each-ref: add --stdin option
2023-04-06 13:38:21 -07:00
Clement Mabileau 4c643fb321 branch: improve error log on branch not found by checking remotes refs
New git users may want to locally delete remote-tracking branches but
don't really understand how they are distinguished from branches by git.
Then one may naively try:
`git branch -d foo/bar` and get a correct error `branch foo/bar not
found` but hard to understand for a newbie, this patch aims to guide one
in such case.

when failing to delete a branch with `git branch -d <branch>` because
of branch not found, try to find a **remote refs** matching `<branch>`
and if so, add an hint:
`Did you forget --remote?` to the error message

Signed-off-by: Clement Mabileau <mabileau.clement@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-06 13:11:26 -07:00
Tao Klerks 42943b950e mergetool: new config guiDefault supports auto-toggling gui by DISPLAY
When no merge.tool or diff.tool is configured or manually selected, the
selection of a default tool is sensitive to the DISPLAY variable; in a
GUI session a gui-specific tool will be proposed if found, and
otherwise a terminal-based one. This "GUI-optimizing" behavior is
important because a GUI can make a huge difference to a user's ability
to understand and correctly complete a non-trivial conflicting merge.

Some time ago the merge.guitool and diff.guitool config options were
introduced to enable users to configure both a GUI tool, and a non-GUI
tool (with fallback if no GUI tool configured), in the same environment.

Unfortunately, the --gui argument introduced to support the selection of
the guitool is still explicit. When using configured tools, there is no
equivalent of the no-tool-configured "propose a GUI tool if we are in a GUI
environment" behavior.

As proposed in <xmqqmtb8jsej.fsf@gitster.g>, introduce new configuration
options, difftool.guiDefault and mergetool.guiDefault, supporting a special
value "auto" which causes the corresponding tool or guitool to be selected
depending on the presence of a non-empty DISPLAY value. Also support "true"
to say "default to the guitool (unless --no-gui is passed on the
commandline)", and "false" as the previous default behavior when these new
configuration options are not specified.

Signed-off-by: Tao Klerks <tao@klerks.biz>
Acked-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-05 21:03:29 -07:00
Junio C Hamano 8b214c2e9d clone: propagate object-format when cloning from void
A user could prepare an empty repository and set it to use SHA256 as
the object format.  The new repository created by "git clone" from
such a repository however would not record that it is expecting
objects in the same SHA256 format.  This works as expected if the
source repository is not empty.

Just like we started copying the name of the primary branch from the
remote repository even if it is unborn in 3d8314f8 (clone: propagate
empty remote HEAD even with other branches, 2022-07-07), lift the
code that records the object format out of the block executed only
when cloning from an instantiated repository, so that it works also
when cloning from an empty repository.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-05 14:17:00 -07:00
Junio C Hamano 5e4070e128 Merge branch 'jk/really-deprecate-pack-redundant'
"git pack-redundant" gave a warning when run, as the command has
outlived its usefulness long ago and is nominated for future
removal.  Now we escalate to give an error.

* jk/really-deprecate-pack-redundant:
  pack-redundant: escalate deprecation warning to an error
2023-04-04 14:28:29 -07:00
Junio C Hamano 9142fce9b0 Merge branch 'ah/rebase-merges-config'
Streamline --rebase-merges command line option handling and
introduce rebase.merges configuration variable.

* ah/rebase-merges-config:
  rebase: add a config option for --rebase-merges
  rebase: deprecate --rebase-merges=""
  rebase: add documentation and test for --no-rebase-merges
2023-04-04 14:28:28 -07:00
Junio C Hamano 7e13d654c2 Merge branch 'jk/fast-export-cleanup'
Code clean-up.

* jk/fast-export-cleanup:
  fast-export: drop unused parameter from anonymize_commit_message()
  fast-export: drop data parameter from anonymous generators
  fast-export: de-obfuscate --anonymize-map handling
  fast-export: factor out anonymized_entry creation
  fast-export: simplify initialization of anonymized hashmaps
  fast-export: drop const when storing anonymized values
2023-04-04 14:28:27 -07:00
Shuqi Liang 1a65b41b38 write-tree: integrate with sparse index
Update 'git write-tree' to allow using the sparse-index in memory
without expanding to a full one.

The recursive algorithm for update_one() was already updated in 2de37c5
(cache-tree: integrate with sparse directory entries, 2021-03-03) to
handle sparse directory entries in the index. Hence we can just set the
requires-full-index to false for "write-tree".

The `p2000` tests demonstrate a ~96% execution time reduction for 'git
write-tree' using a sparse index:

Test                                           before  after
-----------------------------------------------------------------
2000.78: git write-tree (full-v3)              0.34    0.33 -2.9%
2000.79: git write-tree (full-v4)              0.32    0.30 -6.3%
2000.80: git write-tree (sparse-v3)            0.47    0.02 -95.8%
2000.81: git write-tree (sparse-v4)            0.45    0.02 -95.6%

Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-04 12:50:54 -07:00
Junio C Hamano e7dca80692 Merge branch 'ab/remove-implicit-use-of-the-repository' into en/header-split-cache-h
* ab/remove-implicit-use-of-the-repository:
  libs: use "struct repository *" argument, not "the_repository"
  post-cocci: adjust comments for recent repo_* migration
  cocci: apply the "revision.h" part of "the_repository.pending"
  cocci: apply the "rerere.h" part of "the_repository.pending"
  cocci: apply the "refs.h" part of "the_repository.pending"
  cocci: apply the "promisor-remote.h" part of "the_repository.pending"
  cocci: apply the "packfile.h" part of "the_repository.pending"
  cocci: apply the "pretty.h" part of "the_repository.pending"
  cocci: apply the "object-store.h" part of "the_repository.pending"
  cocci: apply the "diff.h" part of "the_repository.pending"
  cocci: apply the "commit.h" part of "the_repository.pending"
  cocci: apply the "commit-reach.h" part of "the_repository.pending"
  cocci: apply the "cache.h" part of "the_repository.pending"
  cocci: add missing "the_repository" macros to "pending"
  cocci: sort "the_repository" rules by header
  cocci: fix incorrect & verbose "the_repository" rules
  cocci: remove dead rule from "the_repository.pending.cocci"
2023-04-04 08:25:52 -07:00
Raghul Nanth A 748b8d669a describe: enable sparse index for describe
git describe compares the index with the working tree when (and only
when) it is run with the "--dirty" flag. This is done by the
run_diff_index() function. The function has been made aware of the
sparse-index in the series that led to 8d2c3732 (Merge branch
'ld/sparse-diff-blame', 2021-12-21). Hence we can just set the
requires-full-index to false for "describe".

Performance metrics

  Test                                                     HEAD~1            HEAD
  -------------------------------------------------------------------------------------------------
  2000.2: git describe --dirty (full-v3)                   0.08(0.09+0.01)   0.08(0.06+0.03) +0.0%
  2000.3: git describe --dirty (full-v4)                   0.09(0.07+0.03)   0.08(0.05+0.04) -11.1%
  2000.4: git describe --dirty (sparse-v3)                 0.88(0.82+0.06)   0.02(0.01+0.05) -97.7%
  2000.5: git describe --dirty (sparse-v4)                 0.68(0.60+0.08)   0.02(0.02+0.04) -97.1%
  2000.6: echo >>new && git describe --dirty (full-v3)     0.08(0.04+0.05)   0.08(0.05+0.04) +0.0%
  2000.7: echo >>new && git describe --dirty (full-v4)     0.08(0.07+0.03)   0.08(0.05+0.04) +0.0%
  2000.8: echo >>new && git describe --dirty (sparse-v3)   0.75(0.69+0.07)   0.02(0.03+0.03) -97.3%
  2000.9: echo >>new && git describe --dirty (sparse-v4)   0.81(0.73+0.09)   0.02(0.01+0.05) -97.5%

Signed-off-by: Raghul Nanth A <nanth.raghul@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-03 11:30:23 -07:00
Derrick Stolee 25bccb4b79 fetch: download bundles once, even with --all
When fetch.bundleURI is set, 'git fetch' downloads bundles from the
given bundle URI before fetching from the specified remote. However,
when using non-file remotes, 'git fetch --all' will launch 'git fetch'
subprocesses which then read fetch.bundleURI and fetch the bundle list
again. We do not expect the bundle list to have new information during
these multiple runs, so avoid these extra calls by un-setting
fetch.bundleURI in the subprocess arguments.

Be careful to skip fetching bundles for the empty bundle string.
Fetching bundles from the empty list presents some interesting test
failures.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-31 10:07:33 -07:00
Junio C Hamano dbb4102f7b Merge branch 'sg/parse-options-h-users'
Code clean-up to include and/or uninclude parse-options.h file as
needed.

* sg/parse-options-h-users:
  treewide: remove unnecessary inclusions of parse-options.h from headers
  treewide: include parse-options.h in source files
2023-03-30 13:47:11 -07:00
Jeff King 6ba21fa65c mark "argv" as unused when we check argc
A few commands don't take any options at all, and confirm this by
checking argc. After that they have no need to look at argv, but we're
still stuck with it by convention. Let's annotate these cases so that
the compiler doesn't complain with -Wunused-parameter.

Note that in scalar and get-tar-commit-id, we're forced to keep argv by
calling convention (the functions must match cmd_main() and builtin
cmd_foo() conventions, respectively). In diff, these are subcommand
modes that we call individually, so we _could_ just drop the argv
parameters entirely. But it's weird to pass argc without argv, and it
implies that the caller knows that the subcommands aren't interested in
further arguments. It's less confusing to just keep them and silence the
compiler warning.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 14:11:24 -07:00
Jeff King 5247b762d0 builtins: mark unused prefix parameters
All builtins receive a "prefix" parameter, but it is only useful if they
need to adjust filenames given by the user on the command line. For
builtins that do not even call parse_options(), they often don't look at
the prefix at all, and -Wunused-parameter complains.

Let's annotate those to silence the compiler warning. I gave a quick
scan of each of these cases, and it seems like they don't have anything
they _should_ be using the prefix for (i.e., there is no hidden bug that
we are missing). The only questionable cases I saw were:

  - in git-unpack-file, we create a tempfile which will always be at the
    root of the repository, even if the command is run from a subdir.
    Arguably this should be created in the subdir from which we're run
    (as we report the path only as a relative name). However, nobody has
    complained, and I'm hesitant to change something that is deep
    plumbing going back to April 2005 (though I think within our
    scripts, the sole caller in git-merge-one-file would be OK, as it
    moves to the toplevel itself).

  - in fetch-pack, local-filesystem remotes are taken as relative to the
    project root, not the current directory. So:

       git init server.git
       [...put stuff in server.git...]
       git init client.git
       cd client.git
       mkdir subdir
       cd subdir
       git fetch-pack ../../server.git ...

    won't work, as we quietly move to the top of the repository before
    interpreting the path (so "../server.git" would work). This is
    weird, but again, nobody has complained and this is how it has
    always worked. And this is how "git fetch" works, too. Plus it
    raises questions about how a configured remote like:

      git config remote.origin.url ../server.git

    should behave. I can certainly come up with a reasonable set of
    behavior, but it may not be worth stirring up complications in a
    plumbing tool.

So I've left the behavior untouched in both of those cases. If anybody
really wants to revisit them, it's easy enough to drop the UNUSED
marker. This commit is just about removing them as obstacles to turning
on -Wunused-parameter all the time.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 14:11:24 -07:00
Jeff King 7915691377 builtins: annotate always-empty prefix parameters
It's usually a bad idea for a builtin's cmd_foo() to ignore the "prefix"
argument it gets, as it needs to prepend that string when accessing any
paths given by the user.

But if a builtin does not ask for the git wrapper to run repository
setup (via the RUN_SETUP or RUN_SETUP_GENTLY flags), then we know the
prefix will always be NULL (it is adjusting for the chdir() done during
repo setup, but there cannot be one if we did not set up the repo). In
those cases it's OK to ignore "prefix", but it's worth annotating for a
few reasons:

  1. It serves as documentation to somebody reading the code about what
     we expect.

  2. If the flags in git.c ever change, the run-time assertion may help
     detect the problem (though only if the command is run from a
     subdirectory of the repository).

  3. It notes to the compiler that we are OK ignoring "prefix". In
     particular, this silences -Wunused-parameter. It _could_ also help
     the compiler generate better code (because it will know the prefix
     is NULL), but in practice this is quite unlikely to matter.

Note that I've only added this annotation to commands which triggered
-Wunused-parameter. It would be correct to add it to any builtin which
doesn't ask for RUN_SETUP, but most of the rest of them do the sensible
thing with "prefix" by passing it to parse_options(). So they're much
more likely to just work if they ever switched to RUN_SETUP, and aren't
worth annotating.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 14:11:24 -07:00
Jeff King 836c8ceb7a builtins: always pass prefix to parse_options()
Our builtins receive a "prefix" argument as part of their cmd_foo()
function. We should always pass this to parse_options() if we're calling
it, as it may be used for OPT_FILENAME() options.

In the cases here, there's no option that would use it, so we're not
fixing any bug. This is just future-proofing and setting a good example
(plus quelling some -Wunused-parameter warnings).

Note in the case of revert/cherry-pick, that we plumb the prefix through
to run_sequencer(), as those builtins are just thin wrappers around it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 14:11:24 -07:00
Jeff King 9dc607f1c2 fast-import: fix file access when run from subdir
In cmd_fast_import(), we ignore the "prefix" argument entirely, even
though it tells us how we may have changed directory to the root of the
repository earlier in the process. Which means that if you run it from a
subdir and point to paths in the filesystem, like:

  cd subdir
  git fast-import --import-marks=foo <dump

then it will look for "foo" in the root of the repository, not the
current directory ("subdir/") which the user would have expected.

We can fix this by recording the prefix and using it as appropriate
whenever we open a file for reading or writing. I found each of these by
looking for cases where we call fopen() within fast-import.c, so this
should cover all cases. The new test triggers each one, as well as
making sure we don't accidentally apply the prefix when --relative-marks
is in use (since that option interprets some paths as relative to a
specific directory).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 14:11:24 -07:00
Junio C Hamano f879501ad0 Merge branch 'jk/fix-proto-downgrade-to-v0'
Transports that do not support protocol v2 did not correctly fall
back to protocol v0 under certain conditions, which has been
corrected.

* jk/fix-proto-downgrade-to-v0:
  git_connect(): fix corner cases in downgrading v2 to v0
2023-03-28 10:51:52 -07:00
Ævar Arnfjörð Bjarmason 3611f7467f for-each-repo: with bad config, don't conflate <path> and <cmd>
Fix a logic error in 4950b2a2b5 (for-each-repo: run subcommands on
configured repos, 2020-09-11). Due to assuming that elements returned
from the repo_config_get_value_multi() call wouldn't be "NULL" we'd
conflate the <path> and <command> part of the argument list when
running commands.

As noted in the preceding commit the fix is to move to a safer
"*_string_multi()" version of the *_multi() API. This change is
separated from the rest because those all segfaulted. In this change
we ended up with different behavior.

When using the "--config=<config>" form we take each element of the
list as a path to a repository. E.g. with a configuration like:

	[repo] list = /some/repo

We would, with this command:

	git for-each-repo --config=repo.list status builtin

Run a "git status" in /some/repo, as:

	git -C /some/repo status builtin

I.e. ask "status" to report on the "builtin" directory. But since a
configuration such as this would result in a "struct string_list *"
with one element, whose "string" member is "NULL":

	[repo] list

We would, when constructing our command-line in
"builtin/for-each-repo.c"...

	strvec_pushl(&child.args, "-C", path, NULL);
	for (i = 0; i < argc; i++)
		strvec_push(&child.args, argv[i]);

...have that "path" be "NULL", and as strvec_pushl() stops when it
sees NULL we'd end with the first "argv" element as the argument to
the "-C" option, e.g.:

	git -C status builtin

I.e. we'd run the command "builtin" in the "status" directory.

In another context this might be an interesting security
vulnerability, but I think that this amounts to a nothingburger on
that front.

A hypothetical attacker would need to be able to write config for the
victim to run, if they're able to do that there's more interesting
attack vectors. See the "safe.directory" facility added in
8d1a744820 (setup.c: create `safe.bareRepository`, 2022-07-14).

An even more unlikely possibility would be an attacker able to
generate the config used for "for-each-repo --config=<key>", but
nothing else (e.g. an automated system producing that list).

Even in that case the attack vector is limited to the user running
commands whose name matches a directory that's interesting to the
attacker (e.g. a "log" directory in a repository). The second
argument (if any) of the command is likely to make git die without
doing anything interesting (e.g. "-p" to "log", there being no "-p"
built-in command to run).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:37:53 -07:00
Ævar Arnfjörð Bjarmason 9e2d884d0f config API: add "string" version of *_value_multi(), fix segfaults
Fix numerous and mostly long-standing segfaults in consumers of
the *_config_*value_multi() API. As discussed in the preceding commit
an empty key in the config syntax yields a "NULL" string, which these
users would give to strcmp() (or similar), resulting in segfaults.

As this change shows, most users users of the *_config_*value_multi()
API didn't really want such an an unsafe and low-level API, let's give
them something with the safety of git_config_get_string() instead.

This fix is similar to what the *_string() functions and others
acquired in[1] and [2]. Namely introducing and using a safer
"*_get_string_multi()" variant of the low-level "_*value_multi()"
function.

This fixes segfaults in code introduced in:

  - d811c8e17c (versionsort: support reorder prerelease suffixes, 2015-02-26)
  - c026557a37 (versioncmp: generalize version sort suffix reordering, 2016-12-08)
  - a086f921a7 (submodule: decouple url and submodule interest, 2017-03-17)
  - a6be5e6764 (log: add log.excludeDecoration config option, 2020-04-16)
  - 92156291ca (log: add default decoration filter, 2022-08-05)
  - 50a044f1e4 (gc: replace config subprocesses with API calls, 2022-09-27)

There are now two users ofthe low-level API:

- One in "builtin/for-each-repo.c", which we'll convert in a
  subsequent commit.

- The "t/helper/test-config.c" code added in [3].

As seen in the preceding commit we need to give the
"t/helper/test-config.c" caller these "NULL" entries.

We could also alter the underlying git_configset_get_value_multi()
function to be "string safe", but doing so would leave no room for
other variants of "*_get_value_multi()" that coerce to other types.

Such coercion can't be built on the string version, since as we've
established "NULL" is a true value in the boolean context, but if we
coerced it to "" for use in a list of strings it'll be subsequently
coerced to "false" as a boolean.

The callback pattern being used here will make it easy to introduce
e.g. a "multi" variant which coerces its values to "bool", "int",
"path" etc.

1. 40ea4ed903 (Add config_error_nonbool() helper function,
   2008-02-11)
2. 6c47d0e8f3 (config.c: guard config parser from value=NULL,
   2008-02-11).
3. 4c715ebb96 (test-config: add tests for the config_set API,
   2014-07-28)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:37:53 -07:00
Ævar Arnfjörð Bjarmason f7b2ff9516 for-each-repo: error on bad --config
As noted in 6c62f01552 (for-each-repo: do nothing on empty config,
2021-01-08) this command wants to ignore a non-existing config key,
but let's not conflate that with bad config.

Before this, all these added tests would pass with an exit code of 0.

We could preserve the comment added in 6c62f01552, but now that we're
directly using the documented repo_config_get_value_multi() value it's
just narrating something that should be obvious from the API use, so
let's drop it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:37:53 -07:00
Ævar Arnfjörð Bjarmason a428619309 config API: have *_multi() return an "int" and take a "dest"
Have the "git_configset_get_value_multi()" function and its siblings
return an "int" and populate a "**dest" parameter like every other
git_configset_get_*()" in the API.

As we'll take advantage of in subsequent commits, this fixes a blind
spot in the API where it wasn't possible to tell whether a list was
empty from whether a config key existed. For now we don't make use of
those new return values, but faithfully convert existing API users.

Most of this is straightforward, commentary on cases that stand out:

- To ensure that we'll properly use the return values of this function
  in the future we're using the "RESULT_MUST_BE_USED" macro introduced
  in [1].

  As git_die_config() now has to handle this return value let's have
  it BUG() if it can't find the config entry. As tested for in a
  preceding commit we can rely on getting the config list in
  git_die_config().

- The loops after getting the "list" value in "builtin/gc.c" could
  also make use of "unsorted_string_list_has_string()" instead of using
  that loop, but let's leave that for now.

- In "versioncmp.c" we now use the return value of the functions,
  instead of checking if the lists are still non-NULL.

1. 1e8697b5c4 (submodule--helper: check repo{_submodule,}_init()
   return values, 2022-09-01),

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:37:53 -07:00
Ævar Arnfjörð Bjarmason b83efcecaf config API: add and use a "git_config_get()" family of functions
We already have the basic "git_config_get_value()" function and its
"repo_*" and "configset" siblings to get a given "key" and assign the
last key found to a provided "value".

But some callers don't care about that value, but just want to use the
return value of the "get_value()" function to check whether the key
exist (or another non-zero return value).

The immediate motivation for this is that a subsequent commit will
need to change all callers of the "*_get_value_multi()" family of
functions. In two cases here we (ab)used it to check whether we had
any values for the given key, but didn't care about the return value.

The rest of the callers here used various other config API functions
to do the same, all of which resolved to the same underlying functions
to provide the answer.

Some of these were using either git_config_get_string() or
git_config_get_string_tmp(), see fe4c750fb1 (submodule--helper: fix a
configure_added_submodule() leak, 2022-09-01) for a recent example. We
can now use a helper function that doesn't require a throwaway
variable.

We could have changed git_configset_get_value_multi() (and then
git_config_get_value() etc.) to accept a "NULL" as a "dest" for all
callers, but let's avoid changing the behavior of existing API
users. Having an "unused" value that we throw away internal to
config.c is cheap.

A "NULL as optional dest" pattern is also more fragile, as the intent
of the caller might be misinterpreted if he were to accidentally pass
"NULL", e.g. when "dest" is passed in from another function.

Another name for this function could have been
"*_config_key_exists()", as suggested in [1]. That would work for all
of these callers, and would currently be equivalent to this function,
as the git_configset_get_value() API normalizes all non-zero return
values to a "1".

But adding that API would set us up to lose information, as e.g. if
git_config_parse_key() in the underlying configset_find_element()
fails we'd like to return -1, not 1.

Let's change the underlying configset_find_element() function to
support this use-case, we'll make further use of it in a subsequent
commit where the git_configset_get_value_multi() function itself will
expose this new return value.

This still leaves various inconsistencies and clobbering or ignoring
of the return value in place. E.g here we're modifying
configset_add_value(), but ever since it was added in [2] we've been
ignoring its "int" return value, but as we're changing the
configset_find_element() it uses, let's have it faithfully ferry that
"ret" along.

Let's also use the "RESULT_MUST_BE_USED" macro introduced in [3] to
assert that we're checking the return value of
configset_find_element().

We're leaving the same change to configset_add_value() for some future
series. Once we start paying attention to its return value we'd need
to ferry it up as deep as do_config_from(), and would need to make
least read_{,very_}early_config() and git_protected_config() return an
"int" instead of "void". Let's leave that for now, and focus on
the *_get_*() functions.

1. 3c8687a73e (add `config_set` API for caching config-like files, 2014-07-28)
2. https://lore.kernel.org/git/xmqqczadkq9f.fsf@gitster.g/
3. 1e8697b5c4 (submodule--helper: check repo{_submodule,}_init()
   return values, 2022-09-01),

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:37:52 -07:00
Ævar Arnfjörð Bjarmason 4a93b899c1 libs: use "struct repository *" argument, not "the_repository"
As can easily be seen from grepping in our sources, we had these uses
of "the_repository" in various library code in cases where the
function in question was already getting a "struct repository *"
argument. Let's use that argument instead.

Out of these changes only the changes to "cache-tree.c",
"commit-reach.c", "shallow.c" and "upload-pack.c" would have cleanly
applied before the migration away from the "repo_*()" wrapper macros
in the preceding commits.

The rest aren't new, as we'd previously implicitly refer to
"the_repository", but it's now more obvious that we were doing the
wrong thing all along, and should have used the parameter instead.

The change to change "get_index_format_default(the_repository)" in
"read-cache.c" to use the "r" variable instead should arguably have
been part of [1], or in the subsequent cleanup in [2]. Let's do it
here, as can be seen from the initial code in [3] it's not important
that we use "the_repository" there, but would prefer to always use the
current repository.

This change excludes the "the_repository" use in "upload-pack.c"'s
upload_pack_advertise(), as the in-flight [4] makes that change.

1. ee1f0c242e (read-cache: add index.skipHash config option,
   2023-01-06)
2. 6269f8eaad (treewide: always have a valid "index_state.repo"
   member, 2023-01-17)
3. 7211b9e753 (repo-settings: consolidate some config settings,
   2019-08-13)
4. <Y/hbUsGPVNAxTdmS@coredump.intra.peff.net>

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:36:46 -07:00
Ævar Arnfjörð Bjarmason c7c33f50bd post-cocci: adjust comments for recent repo_* migration
In preceding commits we changed many calls to macros that were
providing a "the_repository" argument to invoke corresponding repo_*()
function instead. Let's follow-up and adjust references to those in
comments, which coccinelle didn't (and inherently can't) catch.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:36:46 -07:00
Ævar Arnfjörð Bjarmason 035c7de9e9 cocci: apply the "revision.h" part of "the_repository.pending"
Apply the part of "the_repository.pending.cocci" pertaining to
"revision.h".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:36:46 -07:00
Ævar Arnfjörð Bjarmason b26a71b1be cocci: apply the "rerere.h" part of "the_repository.pending"
Apply the part of "the_repository.pending.cocci" pertaining to
"rerere.h".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:36:46 -07:00
Ævar Arnfjörð Bjarmason 12cb1c10a6 cocci: apply the "refs.h" part of "the_repository.pending"
Apply the part of "the_repository.pending.cocci" pertaining to
"refs.h".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:36:46 -07:00
Ævar Arnfjörð Bjarmason a5183d7696 cocci: apply the "promisor-remote.h" part of "the_repository.pending"
Apply the part of "the_repository.pending.cocci" pertaining to
"promisor-remote.h".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:36:46 -07:00
Ævar Arnfjörð Bjarmason afe27c8894 cocci: apply the "packfile.h" part of "the_repository.pending"
Apply the part of "the_repository.pending.cocci" pertaining to
"packfile.h".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:36:45 -07:00
Ævar Arnfjörð Bjarmason bab821646a cocci: apply the "pretty.h" part of "the_repository.pending"
Apply the part of "the_repository.pending.cocci" pertaining to
"pretty.h".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:36:45 -07:00
Ævar Arnfjörð Bjarmason bc726bd075 cocci: apply the "object-store.h" part of "the_repository.pending"
Apply the part of "the_repository.pending.cocci" pertaining to
"object-store.h".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:36:45 -07:00
Ævar Arnfjörð Bjarmason 085390328f cocci: apply the "diff.h" part of "the_repository.pending"
Apply the part of "the_repository.pending.cocci" pertaining to
"diff.h".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:36:45 -07:00
Ævar Arnfjörð Bjarmason ecb5091fd4 cocci: apply the "commit.h" part of "the_repository.pending"
Apply the part of "the_repository.pending.cocci" pertaining to
"commit.h".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:36:45 -07:00
Ævar Arnfjörð Bjarmason cb338c23d6 cocci: apply the "commit-reach.h" part of "the_repository.pending"
Apply the part of "the_repository.pending.cocci" pertaining to
"commit-reach.h".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:36:36 -07:00
Ævar Arnfjörð Bjarmason d850b7a545 cocci: apply the "cache.h" part of "the_repository.pending"
Apply the part of "the_repository.pending.cocci" pertaining to
"cache.h".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:36:36 -07:00
William Sprent 00408adeac builtin/sparse-checkout: add check-rules command
There exists no direct way to interrogate git about which paths are
matched by a given set of sparsity rules. It is possible to get this
information from git, but it includes checking out the commit that
contains the paths, applying the sparse checkout patterns and then using
something like 'git ls-files -t' to check if the skip worktree bit is
set. This works in some case, but there are cases where it is awkward or
infeasible to generate a checkout for this purpose.

Exposing the pattern matching of sparse checkout enables more tooling to
be built and avoids a situation where tools that want to reason about
sparse checkouts start containing parallel implementation of the rules.
To accommodate this, add a 'check-rules' subcommand to the
'sparse-checkout' builtin along the lines of the 'git check-ignore' and
'git check-attr' commands. The new command accepts a list of paths on
stdin and outputs just the ones the match the sparse checkout.

To allow for use in a bare repository and to allow for interrogating
about other patterns than the current ones, include a '--rules-file'
option which allows the caller to explicitly pass sparse checkout rules
in the format accepted by 'sparse-checkout set --stdin'.

To allow for reuse of the handling of input patterns for the
'--rules-file' flag, modify 'add_patterns_from_input()' to be able to
read from a 'FILE' instead of just stdin.

To allow for reuse of the logic which decides whether or not rules
should be interpreted as cone-mode patterns, split that part out of
'update_modes()' such that can be called without modifying the config.

An alternative could have been to create a new 'check-sparsity' command.
However, placing it under 'sparse-checkout' allows for a) more easily
re-using the sparse checkout pattern matching and cone/non-code mode
handling, and b) keeps the documentation for the command next to the
experimental warning and the cone-mode discussion.

Signed-off-by: William Sprent <williams@unity3d.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-27 10:51:12 -07:00
William Sprent 24fc2cde64 builtin/sparse-checkout: remove NEED_WORK_TREE flag
In preparation for adding a sub-command to 'sparse-checkout' that can be
run in a bare repository, remove the 'NEED_WORK_TREE' flag from its
entry in the 'commands' array of 'git.c'.

To avoid that this changes any behaviour, add calls to
'setup_work_tree()' to all of the 'sparse-checkout' sub-commands and add
tests that verify that 'sparse-checkout <cmd>' still fail with a clear
error message telling the user that the command needs a work tree.

Signed-off-by: William Sprent <williams@unity3d.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-27 10:43:51 -07:00
Rubén Justo 3521c63213 branch: avoid unnecessary worktrees traversals
When we rename a branch ref, we need to update any worktree that have
its HEAD pointing to the branch ref being renamed, so to make it use the
new ref name.

If we know in advance that we're renaming a branch that is not currently
checked out in any worktree, we can skip this step entirely.  Let's do
it so.

Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-27 09:40:15 -07:00
Rubén Justo a675ad1708 branch: rename orphan branches in any worktree
In cfaff3aac (branch -m: allow renaming a yet-unborn branch, 2020-12-13)
we added support for renaming an orphan branch when that branch is
checked out in the current worktree.

Let's also allow renaming an orphan branch checked out in a worktree
different than the current one.

Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-27 09:40:15 -07:00
Rubén Justo 7a6ccdfb4e branch: description for orphan branch errors
In bcfc82bd48 (branch: description for non-existent branch errors,
2022-10-08) we checked the HEAD in the current worktree to detect if the
branch to operate with is an orphan branch, so as to avoid the confusing
error: "No branch named...".

If we are asked to operate with an orphan branch in a different working
tree than the current one, we need to check the HEAD in that different
working tree.

Let's extend the check we did in bcfc82bd48, to check the HEADs in all
worktrees linked to the current repository, using the helper introduced
in 31ad6b61bd (branch: add branch_checked_out() helper, 2022-06-15).

The helper, branch_checked_out(), does its work obtaining internally a
list of worktrees linked to the current repository.  Obtaining that list
is not a lightweight work because it implies disk access.

In copy_or_rename_branch() we already have a list of worktrees.  Let's
use that already obtained list, and avoid using here the helper.

Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-27 09:40:14 -07:00
Rubén Justo d7f4ca61b5 branch: use get_worktrees() in copy_or_rename_branch()
Obtaining the list of worktrees, using get_worktrees(), is not a
lightweight operation, because it involves reading from disk.

Let's stop calling get_worktrees() in reject_rebase_or_bisect_branch()
and in replace_each_worktree_head_symref().  Make them receive the list
of worktrees from their only caller, copy_or_rename_branch().

Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-27 09:40:14 -07:00
Rubén Justo 2e8af499ff branch: test for failures while renaming branches
When we introduced replace_each_worktree_head_symref() in 70999e9cec
(branch -m: update all per-worktree HEADs, 2016-03-27), we implemented a
best effort approach.

If we are asked to rename a branch that is simultaneously checked out in
multiple worktrees, we try to update all of those worktrees.  If we fail
updating any of them, we die() as a signal that something has gone
wrong.  However, at this point, the branch ref has already been renamed
and also updated the HEADs of the successfully updated worktrees.
Despite returning an error, we do not try to rollback those changes.

Let's add a test to notice if we change this behavior in the future.

In next commits we will change replace_each_worktree_head_symref() to
work more closely with its only caller, copy_or_rename_branch().  Let's
move the former closer to its caller, to facilitate those changes.

Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-27 09:40:14 -07:00
Alex Henrie 6605fb70cb rebase: add a config option for --rebase-merges
The purpose of the new option is to accommodate users who would like
--rebase-merges to be on by default and to facilitate turning on
--rebase-merges by default without configuration in a future version of
Git.

Name the new option rebase.rebaseMerges, even though it is a little
redundant, for consistency with the name of the command line option and
to be clear when scrolling through values in the [rebase] section of
.gitconfig.

Support setting rebase.rebaseMerges to the nonspecific value "true" for
users who don't need to or don't want to learn about the difference
between rebase-cousins and no-rebase-cousins.

Make --rebase-merges without an argument on the command line override
any value of rebase.rebaseMerges in the configuration, for consistency
with other command line flags with optional arguments that have an
associated config option.

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-27 09:32:49 -07:00
Alex Henrie 33561f5170 rebase: deprecate --rebase-merges=""
The unusual syntax --rebase-merges="" (that is, --rebase-merges with an
empty string argument) has been an undocumented synonym of
--rebase-merges without an argument. Deprecate that syntax to avoid
confusion when a rebase.rebaseMerges config option is introduced, where
rebase.rebaseMerges="" will be equivalent to --no-rebase-merges.

It is not likely that anyone is actually using this syntax, but just in
case, deprecate the empty string argument instead of dropping support
for it immediately.

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-27 09:32:49 -07:00
Jeff King 4406522b76 pack-redundant: escalate deprecation warning to an error
In c3b58472be (pack-redundant: gauge the usage before proposing its
removal, 2020-08-25), we added a big, ugly warning when pack-redundant
is run. The plan there indicated that we would ratchet that up to an
error before finally removing it. Since it has been 2.5 years (and 9
releases) since then, let's continue with the plan.

Note that we did get one bite on the warning, which was somebody asking
about alternatives:

  https://lore.kernel.org/git/CAKvOHKAFXQwt4D8yUCCkf_TQL79mYaJ=KAKhtpDNTvHJFuX1NA@mail.gmail.com/

but we didn't undo the ugly warning (and the advice continues to be "use
repack -d" instead).

There was also some discussion around the time of the deprecation that
pack-redundant was invoked by the bitbake tool, and it still seems to do
so now:

  https://git.openembedded.org/bitbake

That use should probably just go away in favor of an occasional repack
(which probably even happens via auto-gc after fetch these days).

But since neither of those data points caused us to cancel the
deprecation plan by dropping the warning, it seems like we should
proceed with the next step.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-23 13:56:02 -07:00
Jeff King d051f1718e fast-export: drop unused parameter from anonymize_commit_message()
As the comment above the function indicates, we do not bother actually
storing commit messages in our anonymization map. But we still take the
message as a parameter, and just ignore it. Let's stop doing that, which
will make -Wunused-parameter happier.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-22 15:37:09 -07:00
Jeff King 65c756fff0 fast-export: drop data parameter from anonymous generators
The anonymization code has a specific generator callback for each type
of data (e.g., one for paths, one for oids, and so on). These all take a
"data" parameter, but none of them use it for anything. Which is not
surprising, as the point is to generate a new name independent of any
input, and each function keeps its own static counter.

We added the extra pointer in d5bf91fde4 (fast-export: add a "data"
callback parameter to anonymize_str(), 2020-06-23) to handle
--anonymize-map parsing, but that turned out to be awkward itself, and
was recently dropped.

So let's get rid of this "data" parameter that nobody is using, both
from the generators and from anonymize_str() which plumbed it through.
This simplifies the code, and makes -Wunused-parameter happier.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-22 15:37:09 -07:00
Jeff King aa548459a0 fast-export: de-obfuscate --anonymize-map handling
When we handle an --anonymize-map option, we parse the orig/anon pair,
and then feed the "orig" string to anonymize_str(), along with a
generator function that duplicates the "anon" string to be cached in the
map.

This works, because anonymize_str() says "ah, there is no mapping yet
for orig; I'll add one from the generator". But there are some
downsides:

  1. It's a bit too clever, as it's not obvious what the code is trying
     to do or why it works.

  2. It requires allowing generator functions to take an extra void
     pointer, which is not something any of the normal callers of
     anonymize_str() want.

  3. It does the wrong thing if the same token is provided twice.
     When there are conflicting options, like:

       git fast-export --anonymize \
         --anonymize-map=foo:one \
	 --anonymize-map=foo:two

     we usually let the second one override the first. But by using
     anonymize_str(), which has first-one-wins logic, we do the
     opposite.

So instead of relying on anonymize_str(), let's directly add the entry
ourselves. We can tweak the tests to show that we handle overridden
options correctly now.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-22 15:37:09 -07:00
Jeff King dcc4e134aa fast-export: factor out anonymized_entry creation
When anonymizing output, there's only one spot where we generate new
entries to add to our hashmap: when anonymize_str() doesn't find an
entry, we use the generate() callback to make one and add it. Let's pull
that into its own function in preparation for another caller.

Note that we'll add one extra feature. In anonymize_str(), we know that
we won't find an existing entry in the hashmap (since it will only try
to add after failing to find one). But other callers won't have the same
behavior, so we should catch this case and free the now-dangling entry.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-22 15:37:09 -07:00
Jeff King d6484e9fab fast-export: simplify initialization of anonymized hashmaps
We take pains to avoid doing a lookup on a hashmap which has not been
initialized with hashmap_init(). That was necessary back when this code
was written. But hashmap_get() became safer in b7879b0ba6 (hashmap:
allow re-use after hashmap_free(), 2020-11-02). Since then it's OK to
call functions on a zero-initialized table; it will just correctly
return NULL, since there is no match.

This simplifies the code a little, and also lets us keep the
initialization line closer to when we add an entry (which is when the
hashmap really does need to be totally initialized). That will help
later refactoring.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-22 15:37:08 -07:00
Jeff King 76e50f7fbc fast-export: drop const when storing anonymized values
We store anonymized values as pointers to "const char *", since they are
conceptually const to callers who use them. But they are actually
allocated strings whose memory is owned by the struct.

The ownership mismatch hasn't been a big deal since we never free() them
(they are held until the program ends), but let's switch them to "char *"
in preparation for changing that.

Since most code only accesses them via anonymize_str(), it can continue
to narrow them to "const char *" in its return value.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-22 15:37:08 -07:00
Junio C Hamano ea09dff59a Merge branch 'ps/receive-pack-unlock-before-die'
"git receive-pack" that responds to "git push" requests failed to
clean a stale lockfile when killed in the middle, which has been
corrected.

* ps/receive-pack-unlock-before-die:
  receive-pack: fix stale packfile locks when dying
2023-03-21 14:18:55 -07:00
Junio C Hamano 1071deae00 Merge branch 'aj/ls-files-format-fix'
Fix for a "ls-files --format="%(path)" that produced nonsense
output, which was a bug in 2.38.

* aj/ls-files-format-fix:
  ls-files: fix "--format" output of relative paths
2023-03-21 14:18:55 -07:00
Junio C Hamano 15108de2fa Merge branch 'jk/format-patch-ignore-noprefix'
"git format-patch" honors the src/dst prefixes set to nonstandard
values with configuration variables like "diff.noprefix", causing
receiving end of the patch that expects the standard -p1 format to
break.  Teach "format-patch" to ignore end-user configuration and
always use the standard prefixes.

This is a backward compatibility breaking change.

* jk/format-patch-ignore-noprefix:
  rebase: prefer --default-prefix to --{src,dst}-prefix for format-patch
  format-patch: add format.noprefix option
  format-patch: do not respect diff.noprefix
  diff: add --default-prefix option
  t4013: add tests for diff prefix options
  diff: factor out src/dst prefix setup
2023-03-21 14:18:55 -07:00
Elijah Newren d48be35ca6 write-or-die.h: move declarations for write-or-die.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:54 -07:00
Elijah Newren e38da487cc setup.h: move declarations for setup.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:54 -07:00
Elijah Newren 32a8f51061 environment.h: move declarations for environment.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:53 -07:00
Elijah Newren d5ebb50dcb wrapper.h: move declarations for wrapper.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:53 -07:00
Elijah Newren 0b027f6ca7 abspath.h: move absolute path functions from cache.h
This is another step towards letting us remove the include of cache.h in
strbuf.c.  It does mean that we also need to add includes of abspath.h
in a number of C files.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:52 -07:00
Elijah Newren 7ee24e18e5 environment: move comment_line_char from cache.h
This is one step towards making strbuf.c not depend upon cache.h.
Additional steps will follow in subsequent commits.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:52 -07:00
Elijah Newren f394e093df treewide: be explicit about dependence on gettext.h
Dozens of files made use of gettext functions, without explicitly
including gettext.h.  This made it more difficult to find which files
could remove a dependence on cache.h.  Make C files explicitly include
gettext.h if they are using it.

However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:51 -07:00
Derrick Stolee 49abcd21da for-each-ref: add ahead-behind format atom
The previous change implemented the ahead_behind() method, including an
algorithm to compute the ahead/behind values for a number of commit tips
relative to a number of commit bases. Now, integrate that algorithm as
part of 'git for-each-ref' hidden behind a new format atom,
ahead-behind. This naturally extends to 'git branch' and 'git tag'
builtins, as well.

This format allows specifying multiple bases, if so desired, and all
matching references are compared against all of those bases. For this
reason, failing to read a reference provided from these atoms results in
an error.

In order to translate the ahead_behind() method information to the
format output code in ref-filter.c, we must populate arrays of
ahead_behind_count structs. In struct ref_array, we store the full array
that will be passed to ahead_behind(). In struct ref_array_item, we
store an array of pointers that point to the relvant items within the
full array. In this way, we can pull all relevant ahead/behind values
directly when formatting output for a specific item. It also ensures the
lifetime of the ahead_behind_count structs matches the time that the
array is being used.

Add specific tests of the ahead/behind counts in t6600-test-reach.sh, as
it has an interesting repository shape. In particular, its merging
strategy and its use of different commit-graphs would demonstrate over-
counting if the ahead_behind() method did not already account for that
possibility.

Also add tests for the specific for-each-ref, branch, and tag builtins.
In the case of 'git tag', there are intersting cases that happen when
some of the selected tips are not commits. This requires careful logic
around commits_nr in the second loop of filter_ahead_behind(). Also, the
test in t7004 is carefully located to avoid being dependent on the GPG
prereq. It also avoids using the test_commit helper, as that will add
ticks to the time and disrupt the expected timestamps in later tag
tests.

Also add performance tests in a new p1300-graph-walks.sh script. This
will be useful for more uses in the future, but for now compare the
ahead-behind counting algorithm in 'git for-each-ref' to the naive
implementation by running 'git rev-list --count' processes for each
input.

For the Git source code repository, the improvement is already obvious:

Test                                            this tree
---------------------------------------------------------------
1500.2: ahead-behind counts: git for-each-ref   0.07(0.07+0.00)
1500.3: ahead-behind counts: git branch         0.07(0.06+0.00)
1500.4: ahead-behind counts: git tag            0.07(0.06+0.00)
1500.5: ahead-behind counts: git rev-list       1.32(1.04+0.27)

But the standard performance benchmark is the Linux kernel repository,
which demosntrates a significant improvement:

Test                                            this tree
---------------------------------------------------------------
1500.2: ahead-behind counts: git for-each-ref   0.27(0.24+0.02)
1500.3: ahead-behind counts: git branch         0.27(0.24+0.03)
1500.4: ahead-behind counts: git tag            0.28(0.27+0.01)
1500.5: ahead-behind counts: git rev-list       4.57(4.03+0.54)

The 'git rev-list' test exists in this change as a demonstration, but it
will be removed in the next change to avoid wasting time on this
comparison.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 12:17:33 -07:00
Derrick Stolee b73dec5530 for-each-ref: add --stdin option
When a user wishes to input a large list of patterns to 'git
for-each-ref' (likely a long list of exact refs) there are frequently
system limits on the number of command-line arguments.

Add a new --stdin option to instead read the patterns from standard
input. Add tests that check that any unrecognized arguments are
considered an error when --stdin is provided. Also, an empty pattern
list is interpreted as the complete ref set.

When reading from stdin, we populate the filter.name_patterns array
dynamically as opposed to pointing to the 'argv' array directly. This is
simple when using a strvec, as it is NULL-terminated in the same way. We
then free the memory directly from the strvec.

Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 12:17:32 -07:00
SZEDER Gábor 49fd551194 treewide: include parse-options.h in source files
The builtins 'ls-remote', 'pack-objects', 'receive-pack', 'reflog' and
'send-pack' use parse_options(), but their source files don't directly
include 'parse-options.h'.  Furthermore, the source files
'diagnose.c', 'list-objects-filter-options.c', 'remote.c' and
'send-pack.c' define option parsing callback functions, while
'revision.c' defines an option parsing helper function, and thus need
access to various fields in 'struct option' and 'struct
parse_opt_ctx_t', but they don't directly include 'parse-options.h'
either.  They all can still be built, of course, because they include
one of the header files that does include 'parse-options.h' (though
unnecessarily, see the next commit).

Add those missing includes to these files, as our general rule is that
"a C file must directly include the header files that declare the
functions and the types it uses".

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 11:26:47 -07:00
Patrick Steinhardt d6606e02aa fetch: centralize printing of reference updates
In order to print updated references during a fetch, the two different
call sites that do this will first call `format_display()` followed by a
call to `fputs()`. This is needlessly roundabout now that we have the
`display_state` structure that encapsulates all of the printing logic
for references.

Move displaying the reference updates into `format_display()` and rename
it to `display_ref_update()` to better match its new purpose, which
finalizes the conversion to make both the formatting and printing logic
of reference updates self-contained. This will make it easier to add new
output formats and printing to a different file descriptor than stderr.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 11:02:43 -07:00
Patrick Steinhardt c4ef5edbc9 fetch: centralize logic to print remote URL
When fetching from a remote, we not only print the actual references
that have changed, but will also print the URL from which we have
fetched them to standard output. The logic to handle this is duplicated
across two different callsites with some non-trivial logic to compute
the anonymized URL. Furthermore, we're using global state to track
whether we have already shown the URL to the user or not.

Refactor the code by moving it into `format_display()`. Like this, we
can convert the global variable into a member of `display_state`. And
second, we can deduplicate the logic to compute the anonymized URL.

This also works as expected when fetching from multiple remotes, for
example via a group of remotes, as we do this by forking a standalone
git-fetch(1) process per remote that is to be fetched.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 11:02:43 -07:00
Patrick Steinhardt 331b7d29f0 fetch: centralize handling of per-reference format
The function `format_display()` is used to print a single reference
update to a buffer which will then ultimately be printed by the caller.
This architecture causes us to duplicate some logic across the different
callsites of this function. This makes it hard to follow the code as
some parts of the logic are located in one place, while other parts of
the logic are located in a different place. Furthermore, by having the
logic scattered around it becomes quite hard to implement a new output
format for the reference updates.

We can make the logic a whole lot easier to understand by making the
`format_display()` function self-contained so that it handles formatting
and printing of the references. This will eventually allow us to easily
implement a completely different output format, but also opens the door
to conditionally print to either stdout or stderr depending on the
output format.

As a first step towards that goal we move the formatting directive used
by both callers to print a single reference update into this function.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 11:02:43 -07:00
Patrick Steinhardt 7c978db889 fetch: pass the full local reference name to format_display
Before printing the name of the local references that would be updated
by a fetch we first prettify the reference name. This is done at the
calling side so that `format_display()` never sees the full name of the
local reference. This restricts our ability to introduce new output
formats that might want to print the full reference name.

Right now, all callsites except one are prettifying the reference name
anyway. And the only callsite that doesn't passes `FETCH_HEAD` as the
hardcoded reference name to `format_display()`, which would never be
changed by a call to `prettify_refname()` anyway. So let's refactor the
code to pass in the full local reference name and then prettify it in
the formatting code.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 11:02:43 -07:00