"cd sub && git grep -f patterns" tried to read "patterns" file at
the top level of the working tree; it has been corrected to read
"sub/patterns" instead.
* jc/grep-f-relative-to-cwd:
grep: -f <path> is relative to $cwd
Allow users to limit or exclude files based on file attributes
during git-add and git-stash.
For example, the chromium project would like to use
$ git add . ':(exclude,attr:submodule)'
as submodules are managed by an external tool, forbidding end users
to record changes with "git add". Allowing "git add" to often
records changes that users do not want in their commits.
This commit does not change any attr magic implementation. It is
only adding attr as an allowed pathspec in git-add and git-stash,
which was previously blocked by GUARD_PATHSPEC and a pathspec mask
in parse_pathspec()).
However, we fix a bug in prefix_magic() where attr values were
unintentionally removed. This was triggerable when parse_pathspec()
is called with PATHSPEC_PREFIX_ORIGIN as a flag, which was the case
for git-stash (Bug originally filed here [*])
Furthermore, while other commands hit this code path it did not
result in unexpected behavior because this bug only impacts the
pathspec->items->original field which is NOT used to filter
paths. However, git-stash does use pathspec->items->original when
building args used to call other git commands. (See add_pathspecs()
usage and implementation in stash.c)
It is possible that when the attr pathspec feature was first added
in b0db704652 (pathspec: allow querying for attributes, 2017-03-13),
"PATHSPEC_ATTR" was just unintentionally left out of a few
GUARD_PATHSPEC() invocations.
Later, to get a more user-friendly error message when attr was used
with git-add, PATHSPEC_ATTR was added as a mask to git-add's
invocation of parse_pathspec() 84d938b732 (add: do not accept
pathspec magic 'attr', 2018-09-18). However, this user-friendly
error message was never added for git-stash.
[Reference]
* https://lore.kernel.org/git/CAMmZTi-0QKtj7Q=sbC5qhipGsQxJFOY-Qkk1jfkRYwfF5FcUVg@mail.gmail.com/)
Signed-off-by: Joanna Wang <jojwang@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update "git maintainance" timers' implementation based on systemd
timers to work with WSL.
* js/systemd-timers-wsl-fix:
maintenance(systemd): support the Windows Subsystem for Linux
"git grep -e A --no-or -e B" is accepted, even though the negation
of "or" did not mean anything, which has been tightened.
* rs/grep-no-no-or:
grep: reject --no-or
"git diff --no-such-option" and other corner cases around the exit
status of the "diff" command has been corrected.
* jk/diff-result-code-cleanup:
diff: drop useless "status" parameter from diff_result_code()
diff: drop useless return values in git-diff helpers
diff: drop useless return from run_diff_{files,index} functions
diff: die when failing to read index in git-diff builtin
diff: show usage for unknown builtin_diff_files() options
diff-files: avoid negative exit value
diff: spell DIFF_INDEX_CACHED out when calling run_diff_index()
transfer.unpackLimit ought to be used as a fallback, but overrode
fetch.unpackLimit and receive.unpackLimit instead.
* ts/unpacklimit-config-fix:
transfer.unpackLimit: fetch/receive.unpackLimit takes precedence
git merge-file knows how to merge files on the file system already. It
would be helpful, however, to allow it to also merge single blobs.
Teach it an `--object-id` option which means that its arguments are
object IDs and not files to allow it to do so.
We handle the empty blob specially since read_mmblob doesn't read it
directly and otherwise users cannot specify an empty ancestor.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we have multiple ways to show the value of a given reference, we
do not have any way to check whether a reference exists at all. While
commands like git-rev-parse(1) or git-show-ref(1) can be used to check
for reference existence in case the reference resolves to something
sane, neither of them can be used to check for existence in some other
scenarios where the reference does not resolve cleanly:
- References which have an invalid name cannot be resolved.
- References to nonexistent objects cannot be resolved.
- Dangling symrefs can be resolved via git-symbolic-ref(1), but this
requires the caller to special case existence checks depending on
whether or not a reference is symbolic or direct.
Furthermore, git-rev-list(1) and other commands do not let the caller
distinguish easily between an actually missing reference and a generic
error.
Taken together, this seems like sufficient motivation to introduce a
separate plumbing command to explicitly check for the existence of a
reference without trying to resolve its contents.
This new command comes in the form of `git show-ref --exists`. This
new mode will exit successfully when the reference exists, with a
specific exit code of 2 when it does not exist, or with 1 when there
has been a generic error.
Note that the only way to properly implement this command is by using
the internal `refs_read_raw_ref()` function. While the public function
`refs_resolve_ref_unsafe()` can be made to behave in the same way by
passing various flags, it does not provide any way to obtain the errno
with which the reference backend failed when reading the reference. As
such, it becomes impossible for us to distinguish generic errors from
the explicit case where the reference wasn't found.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The synopsis treats the `--verify` and the implicit mode the same. They
are slightly different though:
- They accept different sets of flags.
- The implicit mode accepts patterns while the `--verify` mode
accepts references.
Split up the synopsis for these two modes such that we can disambiguate
those differences.
While at it, drop "--quiet" from the pattern mode's synopsis. It does
not make a lot of sense to list patterns, but squelch the listing output
itself. The description for "--quiet" is adapted accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-show-ref(1) command has three different modes, of which one is
implicit and the other two can be chosen explicitly by passing a flag.
But while these modes are standalone and cause us to execute completely
separate code paths, we gladly accept the case where a user asks for
both `--exclude-existing` and `--verify` at the same time even though it
is not obvious what will happen. Spoiler: we ignore `--verify` and
execute the `--exclude-existing` mode.
Let's explicitly detect this invalid usage and die in case both modes
were requested.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The patterns subcommand is the last command that still uses global
variables to track its options. Convert it to use a structure instead
with the same motivation as preceding commits.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `show_one()` function implicitly receives a bunch of options which
are tracked via global variables. This makes it hard to see which
subcommands of git-show-ref(1) actually make use of these options.
Introduce a `show_one_options` structure that gets passed down to this
function. This allows us to get rid of more global state and makes it
more explicit which subcommands use those options.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When passing patterns to git-show-ref(1) we're checking whether any
reference matches -- if none do, we indicate this condition via an
unsuccessful exit code.
We're using a global variable to count these matches, which is required
because the counter is getting incremented in a callback function. But
now that we have the `struct show_ref_data` in place, we can get rid of
the global variable and put the counter in there instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's not immediately obvious options which options are applicable to
what subcommand in git-show-ref(1) because all options exist as global
state. This can easily cause confusion for the reader.
Refactor options for the `--exclude-existing` subcommand to be contained
in a separate structure. This structure is stored on the stack and
passed down as required. Consequently, it clearly delimits the scope of
those options and requires the reader to worry less about global state.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When passing patterns to `git show-ref` we have some code that will
cause us to die if `verify && !quiet` is true. But because `verify`
indicates a different subcommand of git-show-ref(1) that causes us to
execute `cmd_show_ref__verify()` and not `cmd_show_ref__patterns()`, the
condition cannot ever be true.
Let's remove this dead code.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a leaking string buffer in `git show-ref --exclude-existing`. While
the buffer is technically not leaking because its variable is declared
as static, there is no inherent reason why it should be.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While not immediately obvious, git-show-ref(1) actually implements three
different subcommands:
- `git show-ref <patterns>` can be used to list references that
match a specific pattern.
- `git show-ref --verify <refs>` can be used to list references.
These are _not_ patterns.
- `git show-ref --exclude-existing` can be used as a filter that
reads references from standard input, performing some conversions
on each of them.
Let's make this more explicit in the code by splitting up the three
subcommands into separate functions. This also allows us to address the
confusingly named `patterns` variable, which may hold either patterns or
reference names.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `pattern` variable is a global variable that tracks either the
reference names (not patterns!) for the `--verify` mode or the patterns
for the non-verify mode. This is a bit confusing due to the slightly
different meanings.
Convert the variable to be local. While this does not yet fix the double
meaning of the variable, this change allows us to address it in a
subsequent patch more easily by explicitly splitting up the different
subcommands of git-show-ref(1).
Note that we introduce a `struct show_ref_data` to pass the patterns to
`show_ref()`. While this is overengineered now, we will extend this
structure in a subsequent patch.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `--missing` object option in rev-list currently works only with
missing blobs/trees. For missing commits the revision walker fails with
a fatal error.
Let's extend the functionality of `--missing` option to also support
commit objects. This is done by adding a `missing_objects` field to
`rev_info`. This field is an `oidset` to which we'll add the missing
commits as we encounter them. The revision walker will now continue the
traversal and call `show_commit()` even for missing commits. In rev-list
we can then check if the commit is a missing commit and call the
existing code for parsing `--missing` objects.
A scenario where this option would be used is to find the boundary
objects between different object directories. Consider a repository with
a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate
object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a
repository, using the `--missing=print` option while disabling the
alternate object directory allows us to find the boundary objects
between the main and alternate object directory.
Helped-by: Patrick Steinhardt <ps@pks.im>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `show_commit()` function already depends on `finish_commit()`, and
in the upcoming commit, we'll also add a dependency on
`finish_object__ma()`. Since in C symbols must be declared before
they're used, let's move `show_commit()` below both `finish_commit()`
and `finish_object__ma()`, so the code is cleaner as a whole without the
need for declarations.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The bit `do_not_die_on_missing_tree` is used in revision.h to ensure the
revision walker does not die when encountering a missing tree. This is
currently exclusively set within `builtin/rev-list.c` to ensure the
`--missing` option works with missing trees.
In the upcoming commits, we will extend `--missing` to also support
missing commits. So let's rename the bit to
`do_not_die_on_missing_objects`, which is object type agnostic and can
be used for both trees/commits.
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* ob/rebase-cleanup:
rebase: move parse_opt_keep_empty() down
rebase: handle --strategy via imply_merge() as well
rebase: simplify code related to imply_merge()
33d7bdd645 (builtin/reflog.c: use parse-options api for expire, delete
subcommands, 2022-01-06) broke the option --single-worktree of git
reflog expire and added a non-printable short flag for it, presumably by
accident. While before it set the variable "all_worktrees" to 0, now it
sets it to 1, its default value. --no-single-worktree is required now
to set it to 0.
Fix it by replacing the variable with one that has the opposite meaning,
to avoid the negation and its potential for confusion. The new variable
"single_worktree" directly captures whether --single-worktree was given.
Also remove the unprintable short flag SOH (start of heading) because it
is undocumented, hard to use and is likely to have been added by mistake
in connection with the negation bug above.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use parentheses and pipes to present alternatives in the argument help
for the --empty options of git am and git rebase, like in the rest of
the documentation.
While at it remove a stray use of the enum empty_action value
STOP_ON_EMPTY_COMMIT to indicate that no short option is present.
While it has a value of 0 and thus there is no user-visible change,
that enum is not meant to hold short option characters. Hard-code 0,
like we do for other options without a short option.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let the parse-options code detect and handle the use of options that are
incompatible with --show-current-patch. This requires exposing the
distinction between the "raw" and "diff" sub-modes. Do that by
splitting the mode RESUME_SHOW_PATCH into RESUME_SHOW_PATCH_RAW and
RESUME_SHOW_PATCH_DIFF and stop tracking sub-modes in a separate struct.
The result is a simpler callback function and more precise error
messages. The original reports a spurious argument or a NULL pointer:
$ git am --show-current-patch --show-current-patch=diff
error: options '--show-current-patch=diff' and '--show-current-patch=raw' cannot be used together
$ git am --show-current-patch=diff --show-current-patch
error: options '--show-current-patch=(null)' and '--show-current-patch=diff' cannot be used together
With this patch we get the more precise:
$ git am --show-current-patch --show-current-patch=diff
error: --show-current-patch=diff is incompatible with --show-current-patch
$ git am --show-current-patch=diff --show-current-patch
error: --show-current-patch is incompatible with --show-current-patch=diff
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-bugreport already rejected unrecognized flag arguments, like
`--diaggnose`, but this doesn't help if the user's mistake was to forget
the `--` in front of the argument. This can result in a user's intended
argument not being parsed with no indication to the user that something
went wrong. Since git-bugreport presently doesn't take any positionals
at all, let's reject all positionals and give the user a usage hint.
Signed-off-by: Emily Shaffer <nasamuffin@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Feeding "git stash store" with a random commit that was not created
by "git stash create" now errors out.
* jc/fail-stash-to-store-non-stash:
stash: be careful what we store
The description of the `git bisect run` command syntax at the beginning
of the manpage is `git bisect run <cmd>...`, which isn't quite clear
about what `<cmd>` is or what the `...` mean; one could think that it is
the whole (quoted) command line with all arguments in a single string,
or that it supports multiple commands, or that it doesn't accept
commands with arguments at all.
Change to `git bisect run <cmd> [<arg>...]` to clarify the syntax,
in both the manpage and the `git bisect -h` command output.
Additionally, change `--term-{new,bad}` et al to `--term-(new|bad)`
for consistency with the synopsis syntax conventions.
Signed-off-by: Javier Mora <cousteaulecommandant@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As per the CodingGuidelines document, it is recommended that error messages
such as die(), error() and warning(), should start with a lowercase letter
and should not end with a period.
This patch adjusts tests to match updated messages.
Signed-off-by: Isoken June Ibizugbe <isokenjune@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git merge-tree" learned to take strategy backend specific options
via the "-X" option, like "git merge" does.
* ty/merge-tree-strategy-options:
merge: introduce {copy|clear}_merge_options()
merge-tree: add -X strategy option
This moves it right next to parse_opt_empty(), which is a much more
logical place. As a side effect, this removes the need for a forward
declaration of imply_merge().
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At least after the successive trimming of enum rebase_type mentioned in
the previous commit, this code did exactly what imply_merge() does, so
just call it instead.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code's evolution left in some bits surrounding enum rebase_type that
don't really make sense any more. In particular, it makes no sense to
invoke imply_merge() if the type is already known not to be
REBASE_APPLY, and it makes no sense to assign the type after calling
imply_merge().
enum rebase_type had more values until commit a74b35081c ("rebase: drop
support for `--preserve-merges`") and commit 10cdb9f38a ("rebase: rename
the two primary rebase backends"). The latter commit also renamed
imply_interactive() to imply_merge().
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git repack" learned "--max-cruft-size" to prevent cruft packs from
growing without bounds.
* tb/repack-max-cruft-size:
repack: free existing_cruft array after use
builtin/repack.c: avoid making cruft packs preferred
builtin/repack.c: implement support for `--max-cruft-size`
builtin/repack.c: parse `--max-pack-size` with OPT_MAGNITUDE
t7700: split cruft-related tests to t7704
These error messages say "new_index" as if that spelling has some
significance to the end users (e.g. the file "$GIT_DIR/new_index"
has some issues), but that is not the case at all. The i18n folks
were made to include the word literally in the translated messages,
which was not a good idea at all. Spell it "new index", as we are
just telling the users that we failed to create a new index file.
The term is expected to be translated to the end-users' languages,
not left as if it were a literal file name.
This dates all the way back to the first re-implemenation of "git
commit" command in C (the scripted version did not have such wording
in its error messages), in f5bbc322 (Port git commit to C.,
2007-11-08).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As described in the CodingGuidelines document, a single line message
given to die() and its friends should not capitalize its first word,
and should not add full-stop at the end.
Signed-off-by: Naomi Ibe <naomi.ibeh69@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In .gitmodules files, submodules are keyed by their names, and the
path to the submodule whose name is $name is specified by the
submodule.$name.path variable. There were a few codepaths that
mixed the name and path up when consulting the submodule database,
which have been corrected. It took long for these bugs to be found
as the name of a submodule initially is the same as its path, and
the problem does not surface until it is moved to a different path,
which apparently happens very rarely.
* js/submodule-fix-misuse-of-path-and-name:
t7420: test that we correctly handle renamed submodules
t7419: test that we correctly handle renamed submodules
t7419, t7420: use test_cmp_config instead of grepping .gitmodules
t7419: actually test the branch switching
submodule--helper: return error from set-url when modifying failed
submodule--helper: use submodule_from_path in set-{url,branch}
Leakfix.
* jk/commit-graph-leak-fixes:
commit-graph: clear oidset after finishing write
commit-graph: free write-context base_graph_name during cleanup
commit-graph: free write-context entries before overwriting
commit-graph: free graph struct that was not added to chain
commit-graph: delay base_graph assignment in add_graph_to_chain()
commit-graph: free all elements of graph chain
commit-graph: move slab-clearing to close_commit_graph()
merge: free result of repo_get_merge_bases()
commit-reach: free temporary list in get_octopus_merge_bases()
t6700: mark test as leak-free
Test coverage for trailers has been improved.
* la/trailer-test-and-doc-updates:
trailer doc: <token> is a <key> or <keyAlias>, not both
trailer doc: separator within key suppresses default separator
trailer doc: emphasize the effect of configuration variables
trailer --unfold help: prefer "reformat" over "join"
trailer --parse docs: add explanation for its usefulness
trailer --only-input: prefer "configuration variables" over "rules"
trailer --parse help: expose aliased options
trailer --no-divider help: describe usual "---" meaning
trailer: trailer location is a place, not an action
trailer doc: narrow down scope of --where and related flags
trailer: add tests to check defaulting behavior with --no-* flags
trailer test description: this tests --where=after, not --where=before
trailer tests: make test cases self-contained
Just like OPT_FILENAME() does, "git grep -f <path>" should treat
the <path> relative to the original $cwd by paying attention to the
prefix the command is given.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git stash store" is meant to store what "git stash create"
produces, as these two are implementation details of the end-user
facing "git stash save" command. Even though it is clearly
documented as such, users would try silly things like "git stash
store HEAD" to render their stash unusable.
Worse yet, because "git stash drop" does not allow such a stash
entry to be removed, "git stash clear" would be the only way to
recover from such a mishap. Reuse the logic that allows "drop" to
refrain from working on such a stash entry to teach "store" to avoid
storing an object that is not a stash entry in the first place.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When mostly the same set of options are to be used to perform
multiple merges, one instance of the merge_options structure may
want to be created and used by copying from the same template
instance. We saw such a use recently in "git merge-tree".
Let's make the pattern official by introducing copy_merge_options()
as a supported way to make a copy of the structure, and also give
clear_merge_options() to release any resources held by a copied
instance. Currently we only make a shallow copy, so the former is a
mere structure assignment while the latter is a no-op, but this may
change in the future as the members of merge_options structure
evolve.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The DESCRIPTION's "first form" is actually the 1st, 2nd, 3rd and 5th
form in SYNOPSIS, the "second form" is the 4th one.
Interestingly, this state of affairs was introduced in
97fe725075 (cat-file docs: fix SYNOPSIS and "-h" output, 2021-12-28)
with the claim of "Now the two will match again." ("the two" being
DESCRIPTION and SYNOPSIS)...
The description also suffers from other correctness and clarity issues,
e.g., the "first form" paragraph discusses -p, -s and -t, but leaves out
-e, which is included in the corresponding SYNOPSIS section; the second
paragraph mentions <format>, which doesn't occur in SYNOPSIS at all, and
of the three batch options, really only describes the behavior of
--batch-check. Also the mention of "drivers" seems an implementation
detail not adding much clarity in a short summary (and isn't expanded
upon in the rest of the man page, either).
Rather than trying to maintain one-to-one (or N-to-M) correspondence
between the DESCRIPTION and SYNOPSIS forms, creating duplication and
providing opportunities for error, shorten the former into a concise
summary describing the two general modes of operation: batch and
non-batch, leaving details to the subsequent manual sections.
While here, fix a grammar error in the description of -e and make the
following further minor improvements:
NAME:
shorten ("content or type and size" isn't the whole story; say
"details" and leave the actual details to later sections)
SYNOPSIS and --help:
move the (--textconv | --filters) form before --batch, closer
to the other non-batch forms
Signed-off-by: Štěpán Němec <stepnem@smrk.net>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We allocate an array of packed_git pointers so that we can sort the list
of cruft packs, but we never free the array, causing a small leak. Note
that we don't need to free the packed_git structs themselves; they're
owned by the repository object.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When doing a `--geometric` repack, we make sure that the preferred pack
(if writing a MIDX) is the largest pack that we *didn't* repack. That
has the effect of keeping the preferred pack in sync with the pack
containing a majority of the repository's reachable objects.
But if the repository happens to double in size, we'll repack
everything. Here we don't specify any `--preferred-pack`, and instead
let the MIDX code choose.
In the past, that worked fine, since there would only be one pack to
choose from: the one we just wrote. But it's no longer necessarily the
case that there is one pack to choose from. It's possible that the
repository also has a cruft pack, too.
If the cruft pack happens to come earlier in lexical order (and has an
earlier mtime than any non-cruft pack), we'll pick that pack as
preferred. This makes it impossible to reuse chunks of the reachable
pack verbatim from pack-objects, so is sub-optimal.
Luckily, this is a somewhat rare circumstance to be in, since we would
have to repack the entire repository during a `--geometric` repack, and
the cruft pack would have to sort ahead of the pack we just created.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Cruft packs are an alternative mechanism for storing a collection of
unreachable objects whose mtimes are recent enough to avoid being
pruned out of the repository.
When cruft packs were first introduced back in b757353676
(builtin/pack-objects.c: --cruft without expiration, 2022-05-20) and
a7d493833f (builtin/pack-objects.c: --cruft with expiration,
2022-05-20), the recommended workflow consisted of:
- Repacking periodically, either by packing anything loose in the
repository (via `git repack -d`) or producing a geometric sequence
of packs (via `git repack --geometric=<d> -d`).
- Every so often, splitting the repository into two packs, one cruft
to store the unreachable objects, and another non-cruft pack to
store the reachable objects.
Repositories may (out of band with the above) choose periodically to
prune out some unreachable objects which have aged out of the grace
period by generating a pack with `--cruft-expiration=<approxidate>`.
This allowed repositories to maintain relatively few packs on average,
and quarantine unreachable objects together in a cruft pack, avoiding
the pitfalls of holding unreachable objects as loose while they age out
(for more, see some of the details in 3d89a8c118
(Documentation/technical: add cruft-packs.txt, 2022-05-20)).
This all works, but can be costly from an I/O-perspective when
frequently repacking a repository that has many unreachable objects.
This problem is exacerbated when those unreachable objects are rarely
(if every) pruned.
Since there is at most one cruft pack in the above scheme, each time we
update the cruft pack it must be rewritten from scratch. Because much of
the pack is reused, this is a relatively inexpensive operation from a
CPU-perspective, but is very costly in terms of I/O since we end up
rewriting basically the same pack (plus any new unreachable objects that
have entered the repository since the last time a cruft pack was
generated).
At the time, we decided against implementing more robust support for
multiple cruft packs. This patch implements that support which we were
lacking.
Introduce a new option `--max-cruft-size` which allows repositories to
accumulate cruft packs up to a given size, after which point a new
generation of cruft packs can accumulate until it reaches the maximum
size, and so on. To generate a new cruft pack, the process works like
so:
- Sort a list of any existing cruft packs in ascending order of pack
size.
- Starting from the beginning of the list, group cruft packs together
while the accumulated size is smaller than the maximum specified
pack size.
- Combine the objects in these cruft packs together into a new cruft
pack, along with any other unreachable objects which have since
entered the repository.
Once a cruft pack grows beyond the size specified via `--max-cruft-size`
the pack is effectively frozen. This limits the I/O churn up to a
quadratic function of the value specified by the `--max-cruft-size`
option, instead of behaving quadratically in the number of total
unreachable objects.
When pruning unreachable objects, we bypass the new code paths which
combine small cruft packs together, and instead start from scratch,
passing in the appropriate `--max-pack-size` down to `pack-objects`,
putting it in charge of keeping the resulting set of cruft packs sized
correctly.
This may seem like further I/O churn, but in practice it isn't so bad.
We could prune old cruft packs for whom all or most objects are removed,
and then generate a new cruft pack with just the remaining set of
objects. But this additional complexity buys us relatively little,
because most objects end up being pruned anyway, so the I/O churn is
well contained.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The repack builtin takes a `--max-pack-size` command-line argument which
it uses to feed into any of the pack-objects children that it may spawn
when generating a new pack.
This option is parsed with OPT_STRING, meaning that we'll accept
anything as input, punting on more fine-grained validation until we get
down into pack-objects.
This is fine, but it's wasteful to spend an entire sub-process just to
figure out that one of its option is bogus. Instead, parse the value of
`--max-pack-size` with OPT_MAGNITUDE in 'git repack', and then pass the
known-good result down to pack-objects.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
set-branch will return an error when setting the config fails so I don't
see why set-url shouldn't. Also skip the sync in this case.
Signed-off-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commands need a path to a submodule but treated it as the name when
modifying the .gitmodules file, leading to confusion when a submodule's
name does not match its path.
Because calling submodule_from_path initializes the submodule cache, we
need to manually trigger a reread before syncing, as the cache is
missing the config change we just made.
Signed-off-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In graph_write() we store commits in an oidset, but never clean it up,
leaking the contents. We should clear it in the cleanup section.
The oidset comes from 6830c36077 (commit-graph.h: replace 'commit_hex'
with 'commits', 2020-04-13), but it was just replacing a string_list
that was also leaked. Curiously, we fixed the leak of some adjacent
variables in commit fa8953cb40 (builtin/commit-graph.c: extract
'read_one_commit()', 2020-05-18), but the oidset wasn't included for
some reason.
In combination with the preceding commits, this lets us mark t5324 as
leak-free.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We call repo_get_merge_bases(), which allocates a commit_list, but never
free the result, causing a leak.
The obvious solution is to free it, but we need to look at the contents
of the first item to decide whether to leave the loop. One option is to
free it in both code paths. But since the commit that the list points to
is longer-lived than the list itself, we can just dereference it
immediately, free the list, and then continue with the existing logic.
This is about the same amount of code, but keeps the list management all
in one place.
This lets us mark a number of merge-related test scripts as leak-free.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A previous commit implemented the `gc.repackFilter` config option
to specify a filter that should be used by `git gc` when
performing repacks.
Another previous commit has implemented
`git repack --filter-to=<dir>` to specify the location of the
packfile containing filtered out objects when using a filter.
Let's implement the `gc.repackFilterTo` config option to specify
that location in the config when `gc.repackFilter` is used.
Now when `git gc` will perform a repack with a <dir> configured
through this option and not empty, the repack process will be
passed a corresponding `--filter-to=<dir>` argument.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A previous commit has implemented `git repack --filter=<filter-spec>` to
allow users to filter out some objects from the main pack and move them
into a new different pack.
It would be nice if this new different pack could be created in a
different directory than the regular pack. This would make it possible
to move large blobs into a pack on a different kind of storage, for
example cheaper storage.
Even in a different directory, this pack can be accessible if, for
example, the Git alternates mechanism is used to point to it. In fact
not using the Git alternates mechanism can corrupt a repo as the
generated pack containing the filtered objects might not be accessible
from the repo any more. So setting up the Git alternates mechanism
should be done before using this feature if the user wants the repo to
be fully usable while this feature is used.
In some cases, like when a repo has just been cloned or when there is no
other activity in the repo, it's Ok to setup the Git alternates
mechanism afterwards though. It's also Ok to just inspect the generated
packfile containing the filtered objects and then just move it into the
'.git/objects/pack/' directory manually. That's why it's not necessary
for this command to check that the Git alternates mechanism has been
already setup.
While at it, as an example to show that `--filter` and `--filter-to`
work well with other options, let's also add a test to check that these
options work well with `--max-pack-size`.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A previous commit has implemented `git repack --filter=<filter-spec>` to
allow users to filter out some objects from the main pack and move them
into a new different pack.
Users might want to perform such a cleanup regularly at the same time as
they perform other repacks and cleanups, so as part of `git gc`.
Let's allow them to configure a <filter-spec> for that purpose using a
new gc.repackFilter config option.
Now when `git gc` will perform a repack with a <filter-spec> configured
through this option and not empty, the repack process will be passed a
corresponding `--filter=<filter-spec>` argument.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new option puts the objects specified by `<filter-spec>` into a
separate packfile.
This could be useful if, for example, some blobs take up a lot of
precious space on fast storage while they are rarely accessed. It could
make sense to move them into a separate cheaper, though slower, storage.
It's possible to find which new packfile contains the filtered out
objects using one of the following:
- `git verify-pack -v ...`,
- `test-tool find-pack ...`, which a previous commit added,
- `--filter-to=<dir>`, which a following commit will add to specify
where the pack containing the filtered out objects will be.
This feature is implemented by running `git pack-objects` twice in a
row. The first command is run with `--filter=<filter-spec>`, using the
specified filter. It packs objects while omitting the objects specified
by the filter. Then another `git pack-objects` command is launched using
`--stdin-packs`. We pass it all the previously existing packs into its
stdin, so that it will pack all the objects in the previously existing
packs. But we also pass into its stdin, the pack created by the previous
`git pack-objects --filter=<filter-spec>` command as well as the kept
packs, all prefixed with '^', so that the objects in these packs will be
omitted from the resulting pack. The result is that only the objects
filtered out by the first `git pack-objects` command are in the pack
resulting from the second `git pack-objects` command.
As the interactions with kept packs are a bit tricky, a few related
tests are added.
Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a new find_pack_prefix() to refactor code that handles finding
the pack prefix from the packtmp and packdir global variables, as we are
going to need this feature again in following commit.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a new finish_pack_objects_cmd() to refactor duplicated code
that handles reading the packfile names from the output of a
`git pack-objects` command and putting it into a string_list, as well as
calling finish_command().
While at it, beautify a code comment a bit in the new function.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org
Signed-off-by: Junio C Hamano <gitster@pobox.com>
9535ce7337 (pack-objects: add list-objects filtering, 2017-11-21)
taught `git pack-objects` to use `--filter`, but required the use of
`--stdout` since a partial clone mechanism was not yet in place to
handle missing objects. Since then, changes like 9e27beaa23
(promisor-remote: implement promisor_remote_get_direct(), 2019-06-25)
and others added support to dynamically fetch objects that were missing.
Even without a promisor remote, filtering out objects can also be useful
if we can put the filtered out objects in a separate pack, and in this
case it also makes sense for pack-objects to write the packfile directly
to an actual file rather than on stdout.
Remove the `--stdout` requirement when using `--filter`, so that in a
follow-up commit, repack can pass `--filter` to pack-objects to omit
certain objects from the resulting packfile.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"checkout --merge -- path" and "update-index --unresolve path" did
not resurrect conflicted state that was resolved to remove path,
but now they do.
* jc/unresolve-removal:
checkout: allow "checkout -m path" to unmerge removed paths
checkout/restore: add basic tests for --merge
checkout/restore: refuse unmerging paths unless checking out of the index
update-index: remove stale fallback code for "--unresolve"
update-index: use unmerge_index_entry() to support removal
resolve-undo: allow resurrecting conflicted state that resolved to deletion
update-index: do not read HEAD and MERGE_HEAD unconditionally
Extract the commonly used initialization of the --stat-width=<width>,
--stat-name-width=<width> and --stat-graph-with=<width> parameters to their
internal default values into a helper function, to avoid repeating the same
initialization code in a few places.
Add a couple of tests to additionally cover existing configuration options
diff.statNameWidth=<width> and diff.statGraphWidth=<width> when used by
git-merge to generate --stat outputs. This closes the gap that existed
previously in the --stat tests, and reduces the chances for having any
regressions introduced by this commit.
While there, perform a small bunch of minor wording tweaks in the improved
unit test, to improve its test-level consistency a bit.
Signed-off-by: Dragan Simic <dsimic@manjaro.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff" learned diff.statNameWidth configuration variable, to
give the default width for the name part in the "--stat" output.
* ds/stat-name-width-configuration:
diff --stat: add config option to limit filename width
Unused parameters in fsmonitor related code paths have been marked
as such.
* jk/fsmonitor-unused-parameter:
run-command: mark unused parameters in start_bg_wait callbacks
fsmonitor: mark unused hashmap callback parameters
fsmonitor/darwin: mark unused parameters in system callback
fsmonitor: mark unused parameters in stub functions
fsmonitor/win32: mark unused parameter in fsm_os__incompatible()
fsmonitor: mark some maybe-unused parameters
fsmonitor/win32: drop unused parameters
fsmonitor: prefer repo_git_path() to git_pathdup()
The load_commit_graph_chain_fd_st() function will stop loading chains
when it sees an error. But if it has loaded any graph slice at all, it
will return it. This is a good thing for normal use (we use what data we
can, and this is just an optimization). But it's a bad thing for
"commit-graph verify", which should be careful about finding any
irregularities. We do complain to stderr with a warning(), but the
verify command still exits with a successful return code.
The new tests here cover corruption of both the base and tip slices of
the chain. The corruption of the base file already works (it is the
first file we look at, so when we see the error we return NULL). The
"tip" case is what is fixed by this patch (it complains to stderr but
still returns the base slice).
Likewise the existing tests for corruption of the commit-graph-chain
file itself need to be updated. We already exited non-zero correctly for
the "base" case, but the "tip" case can now do so, too.
Note that this also causes us to adjust a test later in the file that
similarly corrupts a tip (though confusingly the test script calls this
"base"). It checks stderr but erroneously expects the whole "verify"
command to exit with a successful code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because it's OK to not have a graph file at all, the graph_verify()
function needs to tell the difference between a missing file and a real
error. So when loading a traditional graph file, we call
open_commit_graph() separately from load_commit_graph_chain_fd_st(), and
don't complain if the first one fails with ENOENT.
When the function learned about chain files in 3da4b609bb (commit-graph:
verify chains with --shallow mode, 2019-06-18), we couldn't be as
careful, since the only way to load a chain was with
read_commit_graph_one(), which did both the open/load as a single unit.
So we'll miss errors in chain files we load, thinking instead that there
was just no chain file at all.
Note that we do still report some of these problems to stderr, as the
loading function calls error() and warning(). But we'd exit with a
successful exit code, which is wrong.
We can fix that by using the recently split open/load functions for
chains. That lets us treat the chain file just like a single file with
respect to error handling here.
An existing test (from 3da4b609bb) shows off the problem; we were
expecting "commit-graph verify" to report success, but that makes no
sense. We did not even verify the contents of the graph data, because we
couldn't load it! I don't think this was an intentional exception, but
rather just the test covering what happened to occur.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add merge strategy option to produce more customizable merge result such
as automatically resolving conflicts.
Signed-off-by: Tang Yuyi <winglovet@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code to keep track of existing packs in the repository while
repacking has been refactored.
* tb/repack-existing-packs-cleanup:
builtin/repack.c: extract common cruft pack loop
builtin/repack.c: avoid directly inspecting "util"
builtin/repack.c: store existing cruft packs separately
builtin/repack.c: extract `has_existing_non_kept_packs()`
builtin/repack.c: extract redundant pack cleanup for existing packs
builtin/repack.c: extract redundant pack cleanup for --geometric
builtin/repack.c: extract marking packs for deletion
builtin/repack.c: extract structure to store existing packs
The argument order was incorrect. This was introduced by 246cac8505
(i18n: turn even more messages into "cannot be used together" ones,
2022-01-05).
Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git update-index" learns "--show-index-version" to inspect
the index format version used by the on-disk index file.
* jc/update-index-show-index-version:
test-tool: retire "index-version"
update-index: add --show-index-version
update-index doc: v4 is OK with JGit and libgit2
Update "git maintainance" timers' implementation based on systemd
timers to work with WSL.
* js/systemd-timers-wsl-fix:
maintenance(systemd): support the Windows Subsystem for Linux
The start_bg_command() function takes a callback to tell when the
background-ed process is "ready". The callback receives the
child_process struct as well as an extra void pointer. But curiously,
neither of the two users of this interface look at either parameter!
This makes some sense. The only non-test user of the API is fsmonitor,
which uses fsmonitor_ipc__get_state() to connect to a single global
fsmonitor daemon (i.e., the one we just started!).
So we could just drop these parameters entirely. But it seems like a
pretty reasonable interface for the "wait" callback to have access to
the details of the spawned process, and to have room for passing extra
data through a void pointer. So let's leave these in place but mark the
unused ones so that -Wunused-parameter does not complain.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Like many hashmap comparison functions, our cookies_cmp() does not look
at its extra void data parameter. This should have been annotated in
02c3c59e62 (hashmap: mark unused callback parameters, 2022-08-19), but
this new case was added around the same time (plus fsmonitor is not
built at all on Linux, so it is easy to miss there).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's a bit of conditionally-compiled code in fsmonitor, so some
function parameters may be unused depending on the build options:
- in fsmonitor--daemon.c's try_to_run_foreground_daemon(), we take a
detach_console argument, but it's only used on Windows. This seems
intentional (and not mistakenly missing other platforms) based on
the discussion in c284e27ba7 (fsmonitor--daemon: implement 'start'
command, 2022-03-25), which introduced it.
- in fsmonitor-setting.c's check_for_incompatible(), we pass the "ipc"
flag down to the system-specific fsm_os__incompatible() helper. But
we can only do so if our platform has such a helper.
In both cases we can mark the argument as MAYBE_UNUSED. That annotates
it enough to suppress the compiler's -Wunused-parameter warning, but
without making it impossible to use the variable, as a regular UNUSED
annotation would.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git grep -e A --no-or -e B" is accepted, even though the negation
of "or" did not mean anything, which has been tightened.
* rs/grep-no-no-or:
grep: reject --no-or
Add new configuration option diff.statNameWidth=<width> that is equivalent
to the command-line option --stat-name-width=<width>, but it is ignored
by format-patch. This follows the logic established by the already
existing configuration option diff.statGraphWidth=<width>.
Limiting the widths of names and graphs in the --stat output makes sense
for interactive work on wide terminals with many columns, hence the support
for these configuration options. They don't affect format-patch because
it already adheres to the traditional 80-column standard.
Update the documentation and add more tests to cover new configuration
option diff.statNameWidth=<width>. While there, perform a few minor code
and whitespace cleanups here and there, as spotted.
Signed-off-by: Dragan Simic <dsimic@manjaro.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When generating the list of packs to store in a MIDX (when given the
`--write-midx` option), we include any cruft packs both during
--geometric and non-geometric repacks.
But the rules for when we do and don't have to check whether any of
those cruft packs were queued for deletion differ slightly between the
two cases.
But the two can be unified, provided there is a little bit of extra
detail added in the comment to clarify when it is safe to avoid checking
for any pending deletions (and why it is OK to do so even when not
required).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `->util` field corresponding to each string_list_item is used to
track the existence of some pack at the beginning of a repack operation
was originally intended to be used as a bitfield.
This bitfield tracked:
- (1 << 0): whether or not the pack should be deleted
- (1 << 1): whether or not the pack is cruft
The previous commit removed the use of the second bit, but a future
patch (from a different series than this one) will introduce a new use
of it.
So we could stop treating the util pointer as a bitfield and instead
start treating it as if it were a boolean. But this would require some
backtracking when that later patch is applied.
Instead, let's avoid touching the ->util field directly, and instead
introduce convenience functions like:
- pack_mark_for_deletion()
- pack_is_marked_for_deletion()
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When repacking with the `--write-midx` option, we invoke the function
`midx_included_packs()` in order to produce the list of packs we want to
include in the resulting MIDX.
This list is comprised of:
- existing .keep packs
- any pack(s) which were written earlier in the same process
- any unchanged packs when doing a `--geometric` repack
- any cruft packs
Prior to this patch, we stored pre-existing cruft and non-cruft packs
together (provided those packs are non-kept). This meant we needed an
additional bit to indicate which non-kept pack(s) were cruft versus
those that aren't.
But alternatively we can store cruft packs in a separate list, avoiding
the need for this extra bit, and simplifying the code below.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When there is:
- at least one pre-existing packfile (which is not marked as kept),
- repacking with the `-d` flag, and
- not doing a cruft repack
, then we pass a handful of additional options to the inner
`pack-objects` process, like `--unpack-unreachable`,
`--keep-unreachable`, and `--pack-loose-unreachable`, in addition to
marking any packs we just wrote for promisor remotes as kept in-core
(with `--keep-pack`, as opposed to the presence of a ".keep" file on
disk).
Because we store both cruft and non-cruft packs together in the same
`existing.non_kept_packs` list, it suffices to check its `nr` member to
see if it is zero or not.
But a following change will store cruft- and non-cruft packs separately,
meaning this check would break as a result. Prepare for this by
extracting this part of the check into a new helper function called
`has_existing_non_kept_packs()`.
This patch does not introduce any functional changes, but prepares us to
make a more isolated change in a subsequent patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To remove redundant packs at the end of a repacking operation, Git uses
its `remove_redundant_pack()` function in a loop over the set of
pre-existing, non-kept packs.
In a later commit, we will split this list into two, one for
pre-existing cruft pack(s), and another for non-cruft pack(s). Prepare
for this by factoring out the routine to loop over and delete redundant
packs into its own function.
Instead of calling `remove_redundant_pack()` directly, we now will call
`remove_redundant_existing_packs()`, which itself dispatches a call to
`remove_redundant_packs_1()`. Note that the geometric repacking code
will still call `remove_redundant_pack()` directly, but see the previous
commit for more details.
Having `remove_redundant_packs_1()` exist as a separate function may
seem like overkill in this patch. However, a later patch will call
`remove_redundant_packs_1()` once over two separate lists, so this
refactoring sets us up for that.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>