1
0
mirror of https://github.com/git/git synced 2024-07-02 15:48:44 +00:00
Commit Graph

1025 Commits

Author SHA1 Message Date
Elijah Newren
bc5c5ec044 cache.h: remove this no-longer-used header
Since this header showed up in some places besides just #include
statements, update/clean-up/remove those other places as well.

Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got
away with violating the rule that all files must start with an include
of git-compat-util.h (or a short-list of alternate headers that happen
to include it first).  This change exposed the violation and caused it
to stop building correctly; fix it by having it include
git-compat-util.h first, as per policy.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21 13:39:53 -07:00
Elijah Newren
08c46a499a read-cache*.h: move declarations for read-cache.c functions from cache.h
For the functions defined in read-cache.c, move their declarations from
cache.h to a new header, read-cache-ll.h.  Also move some related inline
functions from cache.h to read-cache.h.  The purpose of the
read-cache-ll.h/read-cache.h split is that about 70% of the sites don't
need the inline functions and the extra headers they include.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21 13:39:53 -07:00
Elijah Newren
baf889c2cd sparse-index.h: move declarations for sparse-index.c from cache.h
Note in particular that this reverses the decision made in 118a2e8bde
("cache: move ensure_full_index() to cache.h", 2021-04-01).

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21 13:39:53 -07:00
Patrick Steinhardt
c40f0b7877 revision: handle pseudo-opts in --stdin mode
While both git-rev-list(1) and git-log(1) support `--stdin`, it only
accepts commits and files. Most notably, it is impossible to pass any of
the pseudo-opts like `--all`, `--glob=` or others via stdin.

This makes it hard to use this function in certain scripted scenarios,
like when one wants to support queries against specific revisions, but
also against reference patterns. While this is theoretically possible by
using arguments, this may run into issues once we hit platform limits
with sufficiently large queries. And because `--stdin` cannot handle
pseudo-opts, the only alternative would be to use a mixture of arguments
and standard input, which is cumbersome.

Implement support for handling pseudo-opts in both commands to support
this usecase better. One notable restriction here is that `--stdin` only
supports "stuck" arguments in the form of `--glob=foo`. This is because
"unstuck" arguments would also require us to read the next line, which
would add quite some complexity to the code. This restriction should be
fine for scripted usage though.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-15 12:09:31 -07:00
Patrick Steinhardt
af37a209ad revision: small readability improvement for reading from stdin
The code that reads lines from standard input manually compares whether
the read line matches "--", which is a bit awkward to read. Furthermore,
we're about to extend the code to also support reading pseudo-options
via standard input, requiring more elaborate handling of lines with a
leading dash.

Refactor the code by hoisting out the check for "--" outside of the
block that checks for a leading dash.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-15 12:09:31 -07:00
Patrick Steinhardt
cc8045018d revision: reorder read_revisions_from_stdin()
Reorder `read_revisions_from_stdin()` so that we can start using
`handle_revision_pseudo_opt()` without a forward declaration in a
subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-15 12:09:31 -07:00
Junio C Hamano
0807e57807 Merge branch 'en/header-split-cache-h'
Header clean-up.

* en/header-split-cache-h: (24 commits)
  protocol.h: move definition of DEFAULT_GIT_PORT from cache.h
  mailmap, quote: move declarations of global vars to correct unit
  treewide: reduce includes of cache.h in other headers
  treewide: remove double forward declaration of read_in_full
  cache.h: remove unnecessary includes
  treewide: remove cache.h inclusion due to pager.h changes
  pager.h: move declarations for pager.c functions from cache.h
  treewide: remove cache.h inclusion due to editor.h changes
  editor: move editor-related functions and declarations into common file
  treewide: remove cache.h inclusion due to object.h changes
  object.h: move some inline functions and defines from cache.h
  treewide: remove cache.h inclusion due to object-file.h changes
  object-file.h: move declarations for object-file.c functions from cache.h
  treewide: remove cache.h inclusion due to git-zlib changes
  git-zlib: move declarations for git-zlib functions from cache.h
  treewide: remove cache.h inclusion due to object-name.h changes
  object-name.h: move declarations for object-name.c functions from cache.h
  treewide: remove unnecessary cache.h inclusion
  treewide: be explicit about dependence on mem-pool.h
  treewide: be explicit about dependence on oid-array.h
  ...
2023-04-25 13:56:20 -07:00
Elijah Newren
b7b189cd5a treewide: reduce includes of cache.h in other headers
We had a handful of headers including cache.h that didn't need to
anymore.  Drop those includes and replace them with includes of
smaller files, or forward declarations.  However, note that two .c
files now need to directly include cache.h, though they should have
been including it all along given they are directly using structs
defined in it.

Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:11 -07:00
Elijah Newren
87bed17907 object-file.h: move declarations for object-file.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:10 -07:00
Elijah Newren
dabab1d6e6 object-name.h: move declarations for object-name.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:09 -07:00
Elijah Newren
74ea5c9574 treewide: be explicit about dependence on trace.h & trace2.h
Dozens of files made use of trace and trace2 functions, without
explicitly including trace.h or trace2.h.  This made it more difficult
to find which files could remove a dependence on cache.h.  Make C files
explicitly include trace.h or trace2.h if they are using them.

Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11 08:52:08 -07:00
Junio C Hamano
6047b28eb7 Merge branch 'en/header-split-cleanup'
Split key function and data structure definitions out of cache.h to
new header files and adjust the users.

* en/header-split-cleanup:
  csum-file.h: remove unnecessary inclusion of cache.h
  write-or-die.h: move declarations for write-or-die.c functions from cache.h
  treewide: remove cache.h inclusion due to setup.h changes
  setup.h: move declarations for setup.c functions from cache.h
  treewide: remove cache.h inclusion due to environment.h changes
  environment.h: move declarations for environment.c functions from cache.h
  treewide: remove unnecessary includes of cache.h
  wrapper.h: move declarations for wrapper.c functions from cache.h
  path.h: move function declarations for path.c functions from cache.h
  cache.h: remove expand_user_path()
  abspath.h: move absolute path functions from cache.h
  environment: move comment_line_char from cache.h
  treewide: remove unnecessary cache.h inclusion from several sources
  treewide: remove unnecessary inclusion of gettext.h
  treewide: be explicit about dependence on gettext.h
  treewide: remove unnecessary cache.h inclusion from a few headers
2023-04-06 13:38:31 -07:00
Junio C Hamano
72871b198f Merge branch 'ab/remove-implicit-use-of-the-repository'
Code clean-up around the use of the_repository.

* ab/remove-implicit-use-of-the-repository:
  libs: use "struct repository *" argument, not "the_repository"
  post-cocci: adjust comments for recent repo_* migration
  cocci: apply the "revision.h" part of "the_repository.pending"
  cocci: apply the "rerere.h" part of "the_repository.pending"
  cocci: apply the "refs.h" part of "the_repository.pending"
  cocci: apply the "promisor-remote.h" part of "the_repository.pending"
  cocci: apply the "packfile.h" part of "the_repository.pending"
  cocci: apply the "pretty.h" part of "the_repository.pending"
  cocci: apply the "object-store.h" part of "the_repository.pending"
  cocci: apply the "diff.h" part of "the_repository.pending"
  cocci: apply the "commit.h" part of "the_repository.pending"
  cocci: apply the "commit-reach.h" part of "the_repository.pending"
  cocci: apply the "cache.h" part of "the_repository.pending"
  cocci: add missing "the_repository" macros to "pending"
  cocci: sort "the_repository" rules by header
  cocci: fix incorrect & verbose "the_repository" rules
  cocci: remove dead rule from "the_repository.pending.cocci"
2023-04-06 13:38:30 -07:00
Junio C Hamano
e7dca80692 Merge branch 'ab/remove-implicit-use-of-the-repository' into en/header-split-cache-h
* ab/remove-implicit-use-of-the-repository:
  libs: use "struct repository *" argument, not "the_repository"
  post-cocci: adjust comments for recent repo_* migration
  cocci: apply the "revision.h" part of "the_repository.pending"
  cocci: apply the "rerere.h" part of "the_repository.pending"
  cocci: apply the "refs.h" part of "the_repository.pending"
  cocci: apply the "promisor-remote.h" part of "the_repository.pending"
  cocci: apply the "packfile.h" part of "the_repository.pending"
  cocci: apply the "pretty.h" part of "the_repository.pending"
  cocci: apply the "object-store.h" part of "the_repository.pending"
  cocci: apply the "diff.h" part of "the_repository.pending"
  cocci: apply the "commit.h" part of "the_repository.pending"
  cocci: apply the "commit-reach.h" part of "the_repository.pending"
  cocci: apply the "cache.h" part of "the_repository.pending"
  cocci: add missing "the_repository" macros to "pending"
  cocci: sort "the_repository" rules by header
  cocci: fix incorrect & verbose "the_repository" rules
  cocci: remove dead rule from "the_repository.pending.cocci"
2023-04-04 08:25:52 -07:00
Junio C Hamano
dbb4102f7b Merge branch 'sg/parse-options-h-users'
Code clean-up to include and/or uninclude parse-options.h file as
needed.

* sg/parse-options-h-users:
  treewide: remove unnecessary inclusions of parse-options.h from headers
  treewide: include parse-options.h in source files
2023-03-30 13:47:11 -07:00
Ævar Arnfjörð Bjarmason
ecb5091fd4 cocci: apply the "commit.h" part of "the_repository.pending"
Apply the part of "the_repository.pending.cocci" pertaining to
"commit.h".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:36:45 -07:00
Ævar Arnfjörð Bjarmason
cb338c23d6 cocci: apply the "commit-reach.h" part of "the_repository.pending"
Apply the part of "the_repository.pending.cocci" pertaining to
"commit-reach.h".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:36:36 -07:00
Ævar Arnfjörð Bjarmason
d850b7a545 cocci: apply the "cache.h" part of "the_repository.pending"
Apply the part of "the_repository.pending.cocci" pertaining to
"cache.h".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-28 07:36:36 -07:00
Elijah Newren
e38da487cc setup.h: move declarations for setup.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:54 -07:00
Elijah Newren
32a8f51061 environment.h: move declarations for environment.c functions from cache.h
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:53 -07:00
Elijah Newren
f394e093df treewide: be explicit about dependence on gettext.h
Dozens of files made use of gettext functions, without explicitly
including gettext.h.  This made it more difficult to find which files
could remove a dependence on cache.h.  Make C files explicitly include
gettext.h if they are using it.

However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an
include of gettext.h, it was left out to avoid conflicting with an
in-flight topic.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21 10:56:51 -07:00
SZEDER Gábor
49fd551194 treewide: include parse-options.h in source files
The builtins 'ls-remote', 'pack-objects', 'receive-pack', 'reflog' and
'send-pack' use parse_options(), but their source files don't directly
include 'parse-options.h'.  Furthermore, the source files
'diagnose.c', 'list-objects-filter-options.c', 'remote.c' and
'send-pack.c' define option parsing callback functions, while
'revision.c' defines an option parsing helper function, and thus need
access to various fields in 'struct option' and 'struct
parse_opt_ctx_t', but they don't directly include 'parse-options.h'
either.  They all can still be built, of course, because they include
one of the header files that does include 'parse-options.h' (though
unnecessarily, see the next commit).

Add those missing includes to these files, as our general rule is that
"a C file must directly include the header files that declare the
functions and the types it uses".

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 11:26:47 -07:00
Junio C Hamano
4d87411ffe Merge branch 'ew/fetch-hiderefs'
A new "fetch.hideRefs" option can be used to exclude specified refs
from "rev-list --objects --stdin --not --all" traversal for
checking object connectivity, most useful when there are many
unrelated histories in a single repository.

* ew/fetch-hiderefs:
  fetch: support hideRefs to speed up connectivity checks
2023-03-17 14:03:10 -07:00
Junio C Hamano
d0732a8120 Merge branch 'jk/unused-post-2.39-part2'
More work towards -Wunused.

* jk/unused-post-2.39-part2: (21 commits)
  help: mark unused parameter in git_unknown_cmd_config()
  run_processes_parallel: mark unused callback parameters
  userformat_want_item(): mark unused parameter
  for_each_commit_graft(): mark unused callback parameter
  rewrite_parents(): mark unused callback parameter
  fetch-pack: mark unused parameter in callback function
  notes: mark unused callback parameters
  prio-queue: mark unused parameters in comparison functions
  for_each_object: mark unused callback parameters
  list-objects: mark unused callback parameters
  mark unused parameters in signal handlers
  run-command: mark error routine parameters as unused
  mark "pointless" data pointers in callbacks
  ref-filter: mark unused callback parameters
  http-backend: mark unused parameters in virtual functions
  http-backend: mark argc/argv unused
  object-name: mark unused parameters in disambiguate callbacks
  serve: mark unused parameters in virtual functions
  serve: use repository pointer to get config
  ls-refs: drop config caching
  ...
2023-03-17 14:03:09 -07:00
Eric Wong
c6ce27ab08 fetch: support hideRefs to speed up connectivity checks
With roughly 800 remotes all fetching into their own
refs/remotes/$REMOTE/* island, the connectivity check[1] gets
expensive for each fetch on systems which lack sufficient RAM to
cache objects.

To do a no-op fetch on one $REMOTE out of hundreds, hideRefs now
allows the no-op fetch to take ~30 seconds instead of ~20 minutes
on a noisy, RAM-constrained machine (localhost, so no network latency):

   git -c fetch.hideRefs=refs \
	-c fetch.hideRefs='!refs/remotes/$REMOTE/' \
	fetch $REMOTE

[1] `git rev-list --objects --stdin --not --all --quiet --alternate-refs'

Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-27 09:27:03 -08:00
Jeff King
be252d3349 for_each_object: mark unused callback parameters
The for_each_{loose,packed}_object interface uses callback functions,
but not every callback needs all of the parameters. Mark the unused ones
to satisfy -Wunused-parameter.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24 09:13:31 -08:00
Jeff King
d3dcfa047f mark "pointless" data pointers in callbacks
Both the object_array_filter() and trie_find() functions use callback
functions that let the caller specify which elements match. These
callbacks take a void pointer in case the caller wants to pass in extra
data. But in each case, the single user of these functions just passes
NULL, and the callback ignores the extra pointer.

We could just remove these unused parameters from the callback interface
entirely. But it's good practice to provide such a pointer, as it guides
future callers of the function in the right direction (rather than
tempting them to access global data). Plus it's consistent with other
generic callback interfaces.

So let's instead annotate the unused parameters, in order to silence the
compiler's -Wunused-parameter warning.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-24 09:13:30 -08:00
Elijah Newren
41771fa435 cache.h: remove dependence on hex.h; make other files include it explicitly
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23 17:25:29 -08:00
Elijah Newren
36bf195890 alloc.h: move ALLOC_GROW() functions from cache.h
This allows us to replace includes of cache.h with includes of the much
smaller alloc.h in many places.  It does mean that we also need to add
includes of alloc.h in a number of C files.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-23 17:25:28 -08:00
Ævar Arnfjörð Bjarmason
6269f8eaad treewide: always have a valid "index_state.repo" member
When the "repo" member was added to "the_index" in [1] the
repo_read_index() was made to populate it, but the unpopulated
"the_index" variable didn't get the same treatment.

Let's do that in initialize_the_repository() when we set it up, and
likewise for all of the current callers initialized an empty "struct
index_state".

This simplifies code that needs to deal with "the_index" or a custom
"struct index_state", we no longer need to second-guess this part of
the "index_state" deep in the stack. A recent example of such
second-guessing is the "istate->repo ? istate->repo : the_repository"
code in [2]. We can now simply use "istate->repo".

We're doing this by making use of the INDEX_STATE_INIT() macro (and
corresponding function) added in [3], which now have mandatory "repo"
arguments.

Because we now call index_state_init() in repository.c's
initialize_the_repository() we don't need to handle the case where we
have a "repo->index" whose "repo" member doesn't match the "repo"
we're setting up, i.e. the "Complete the double-reference" code in
repo_read_index() being altered here. That logic was originally added
in [1], and was working around the lack of what we now have in
initialize_the_repository().

For "fsmonitor-settings.c" we can remove the initialization of a NULL
"r" argument to "the_repository". This was added back in [4], and was
needed at the time for callers that would pass us the "r" from an
"istate->repo". Before this change such a change to
"fsmonitor-settings.c" would segfault all over the test suite (e.g. in
t0002-gitfile.sh).

This change has wider eventual implications for
"fsmonitor-settings.c". The reason the other lazy loading behavior in
it is required (starting with "if (!r->settings.fsmonitor) ..." is
because of the previously passed "r" being "NULL".

I have other local changes on top of this which move its configuration
reading to "prepare_repo_settings()" in "repo-settings.c", as we could
now start to rely on it being called for our "r". But let's leave all
of that for now, and narrowly remove this particular part of the
lazy-loading.

1. 1fd9ae517c (repository: add repo reference to index_state,
   2021-01-23)
2. ee1f0c242e (read-cache: add index.skipHash config option,
   2023-01-06)
3. 2f6b1eb794 (cache API: add a "INDEX_STATE_INIT" macro/function,
   add release_index(), 2023-01-12)
4. 1e0ea5c431 (fsmonitor: config settings are repository-specific,
   2022-03-25)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-17 14:32:06 -08:00
Ævar Arnfjörð Bjarmason
2f6b1eb794 cache API: add a "INDEX_STATE_INIT" macro/function, add release_index()
Hopefully in some not so distant future, we'll get advantages from always
initializing the "repo" member of the "struct index_state". To make
that easier let's introduce an initialization macro & function.

The various ad-hoc initialization of the structure can then be changed
over to it, and we can remove the various "0" assignments in
discard_index() in favor of calling index_state_init() at the end.

While not strictly necessary, let's also change the CALLOC_ARRAY() of
various "struct index_state *" to use an ALLOC_ARRAY() followed by
index_state_init() instead.

We're then adding the release_index() function and converting some
callers (including some of these allocations) over to it if they
either won't need to use their "struct index_state" again, or are just
about to call index_state_init().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-16 10:46:58 -08:00
Junio C Hamano
179547932f Merge branch 'jk/unused-post-2.39'
Code clean-up around unused function parameters.

* jk/unused-post-2.39:
  userdiff: mark unused parameter in internal callback
  list-objects-filter: mark unused parameters in virtual functions
  diff: mark unused parameters in callbacks
  xdiff: mark unused parameter in xdl_call_hunk_func()
  xdiff: drop unused parameter in def_ff()
  ws: drop unused parameter from ws_blank_line()
  list-objects: drop process_gitlink() function
  blob: drop unused parts of parse_blob_buffer()
  ls-refs: use repository parameter to iterate refs
2022-12-26 11:42:05 +09:00
Junio C Hamano
9ea1378d04 Merge branch 'ab/various-leak-fixes'
Various leak fixes.

* ab/various-leak-fixes:
  built-ins: use free() not UNLEAK() if trivial, rm dead code
  revert: fix parse_options_concat() leak
  cherry-pick: free "struct replay_opts" members
  rebase: don't leak on "--abort"
  connected.c: free the "struct packed_git"
  sequencer.c: fix "opts->strategy" leak in read_strategy_opts()
  ls-files: fix a --with-tree memory leak
  revision API: call graph_clear() in release_revisions()
  unpack-file: fix ancient leak in create_temp_file()
  built-ins & libs & helpers: add/move destructors, fix leaks
  dir.c: free "ident" and "exclude_per_dir" in "struct untracked_cache"
  read-cache.c: clear and free "sparse_checkout_patterns"
  commit: discard partial cache before (re-)reading it
  {reset,merge}: call discard_index() before returning
  tests: mark tests as passing with SANITIZE=leak
2022-12-14 15:55:46 +09:00
Jeff King
61bdc7c5d8 diff: mark unused parameters in callbacks
The diff code provides a format_callback interface, but not every
callback needs each parameter (e.g., the "opt" and "data" parameters are
frequently left unused). Likewise for the output_prefix callback, the
low-level change/add_remove interfaces, the callbacks used by
xdi_diff(), etc.

Mark unused arguments in the callback implementations to quiet
-Wunused-parameter.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-13 22:16:23 +09:00
Junio C Hamano
f8828f9125 Merge branch 'ps/receive-use-only-advertised'
"git receive-pack" used to use all the local refs as the boundary for
checking connectivity of the data "git push" sent, but now it uses
only the refs that it advertised to the pusher. In a repository with
the .hideRefs configuration, this reduces the resources needed to
perform the check.
cf. <221028.86bkpw805n.gmgdl@evledraar.gmail.com>
cf. <xmqqr0yrizqm.fsf@gitster.g>

* ps/receive-use-only-advertised:
  receive-pack: only use visible refs for connectivity check
  rev-parse: add `--exclude-hidden=` option
  revision: add new parameter to exclude hidden refs
  revision: introduce struct to handle exclusions
  revision: move together exclusion-related functions
  refs: get rid of global list of hidden refs
  refs: fix memory leak when parsing hideRefs config
2022-11-23 11:22:25 +09:00
Ævar Arnfjörð Bjarmason
fc47252d5b revision API: call graph_clear() in release_revisions()
Call graph_clear() in release_revisions(), this will free memory
allocated by e.g. this command, which will now run without memory
leaks:

	git -P log -1 --graph --no-graph --graph

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-21 12:32:48 +09:00
Patrick Steinhardt
8c1bc2a71a revision: add new parameter to exclude hidden refs
Users can optionally hide refs from remote users in git-upload-pack(1),
git-receive-pack(1) and others via the `transfer.hideRefs`, but there is
not an easy way to obtain the list of all visible or hidden refs right
now. We'll require just that though for a performance improvement in our
connectivity check.

Add a new option `--exclude-hidden=` that excludes any hidden refs from
the next pseudo-ref like `--all` or `--branches`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-17 16:22:52 -05:00
Patrick Steinhardt
1e9f273ac0 revision: introduce struct to handle exclusions
The functions that handle exclusion of refs work on a single string
list. We're about to add a second mechanism for excluding refs though,
and it makes sense to reuse much of the same architecture for both kinds
of exclusion.

Introduce a new `struct ref_exclusions` that encapsulates all the logic
related to excluding refs and move the `struct string_list` that holds
all wildmatch patterns of excluded refs into it. Rename functions that
operate on this struct to match its name.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-17 16:22:52 -05:00
Patrick Steinhardt
05b9425960 revision: move together exclusion-related functions
Move together the definitions of functions that handle exclusions of
refs so that related functionality sits in a single place, only.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-17 16:22:51 -05:00
Ævar Arnfjörð Bjarmason
916ebb327c revisions API: extend the nascent REV_INFO_INIT macro
Have the REV_INFO_INIT macro added in [1] declare more members of
"struct rev_info" that we can initialize statically, and have
repo_init_revisions() do so with the memcpy(..., &blank) idiom
introduced in [2].

As the comment for the "REV_INFO_INIT" macro notes this still isn't
sufficient to initialize a "struct rev_info" for use yet. But we are
getting closer to that eventual goal.

Even though we can't fully initialize a "struct rev_info" with
REV_INFO_INIT it's useful for readability to clearly separate those
things that we can statically initialize, and those that we can't.

This change could replace the:

	list_objects_filter_init(&revs->filter);

In the repo_init_revisions() with this line, at the end of the
REV_INFO_INIT deceleration in revisions.h:

	.filter = LIST_OBJECTS_FILTER_INIT, \

But doing so would produce a minor conflict with an outstanding
topic[3]. Let's skip that for now. I have follow-ups to initialize
more of this statically, e.g. changes to get rid of grep_init(). We
can initialize more members with the macro in a future series.

1. f196c1e908 (revisions API users: use release_revisions() needing
   REV_INFO_INIT, 2022-04-13)
2. 5726a6b401 (*.c *_init(): define in terms of corresponding *_INIT
   macro, 2021-07-01)
3. https://lore.kernel.org/git/265b292ed5c2de19b7118dfe046d3d9d932e2e89.1667901510.git.ps@pks.im/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-08 16:34:01 -05:00
Junio C Hamano
b988427918 Merge branch 'rs/diff-caret-bang-with-parents'
"git diff rev^!" did not show combined diff to go to the rev from
its parents.

* rs/diff-caret-bang-with-parents:
  diff: support ^! for merges
  revisions.txt: unspecify order of resolved parts of ^!
  revision: use strtol_i() for exclude_parent
2022-10-25 17:11:43 -07:00
Ævar Arnfjörð Bjarmason
82dd01d81b CodingGuidelines: allow declaring variables in for loops
Since 44ba10d671 (revision: use C99 declaration of variable in for()
loop, 2021-11-14) released with v2.35.0 we've had a variable declared
with in a for loop.

Since then we've had inadvertent follow-ups to that with at least
cb2607759e (merge-ort: store more specific conflict information,
2022-06-18) released with v2.38.0.

As November 2022 is within the window of this upcoming release,
let's update the guideline to allow this.  We can have the promised
"revisit" discussion while this patch cooks, and drop it if it turns
out that it is still premature, which is not expected to happen at
this moment.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-10 13:41:11 -07:00
René Scharfe
793c21182e revision: use strtol_i() for exclude_parent
Avoid silent overflow of the int exclude_parent by using the appropriate
function, strtol_i(), to parse its value.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-01 15:58:33 -07:00
Junio C Hamano
298a958224 Merge branch 'jk/list-objects-filter-cleanup'
A couple of bugfixes with code clean-up.

* jk/list-objects-filter-cleanup:
  list-objects-filter: convert filter_spec to a strbuf
  list-objects-filter: add and use initializers
  list-objects-filter: handle null default filter spec
  list-objects-filter: don't memset after releasing filter struct
2022-09-19 14:35:24 -07:00
Junio C Hamano
dd407f1c7c Merge branch 'ab/unused-annotation'
Undoes 'jk/unused-annotation' topic and redoes it to work around
Coccinelle rules misfiring false positives in unrelated codepaths.

* ab/unused-annotation:
  git-compat-util.h: use "deprecated" for UNUSED variables
  git-compat-util.h: use "UNUSED", not "UNUSED(var)"
2022-09-14 12:56:39 -07:00
Junio C Hamano
a6b42ec0c6 Merge branch 'jk/unused-annotation'
Annotate function parameters that are not used (but cannot be
removed for structural reasons), to prepare us to later compile
with -Wunused warning turned on.

* jk/unused-annotation:
  is_path_owned_by_current_uid(): mark "report" parameter as unused
  run-command: mark unused async callback parameters
  mark unused read_tree_recursive() callback parameters
  hashmap: mark unused callback parameters
  config: mark unused callback parameters
  streaming: mark unused virtual method parameters
  transport: mark bundle transport_options as unused
  refs: mark unused virtual method parameters
  refs: mark unused reflog callback parameters
  refs: mark unused each_ref_fn parameters
  git-compat-util: add UNUSED macro
2022-09-14 12:56:39 -07:00
Junio C Hamano
655e494047 Merge branch 'jk/rev-list-verify-objects-fix'
"git rev-list --verify-objects" ought to inspect the contents of
objects and notice corrupted ones, but it didn't when the commit
graph is in use, which has been corrected.

* jk/rev-list-verify-objects-fix:
  rev-list: disable commit graph with --verify-objects
  lookup_commit_in_graph(): use prepare_commit_graph() to check for graph
2022-09-13 11:38:24 -07:00
Jeff King
2a01bdedf8 list-objects-filter: add and use initializers
In 7e2619d8ff (list_objects_filter_options: plug leak of filter_spec
strings, 2022-09-08), we noted that the filter_spec string_list was
inconsistent in how it handled memory ownership of strings stored in the
list. The fix there was a bit of a band-aid to set the "strdup_strings"
variable right before adding anything.

That works OK, and it lets the users of the API continue to
zero-initialize the struct. But it makes the code a bit hard to follow
and accident-prone, as any other spots appending the filter_spec need to
think about whether to set the strdup_strings value, too (there's one
such spot in partial_clone_get_default_filter_spec(), which is probably
a possible memory leak).

So let's do that full cleanup now. We'll introduce a
LIST_OBJECTS_FILTER_INIT macro and matching function, and use them as
appropriate (though it is for the "_options" struct, this matches the
corresponding list_objects_filter_release() function).

This is harder than it seems! Many other structs, like
git_transport_data, embed the filter struct. So they need to initialize
it themselves even if the rest of the enclosing struct is OK with
zero-initialization. I found all of the relevant spots by grepping
manually for declarations of list_objects_filter_options. And then doing
so recursively for structs which embed it, and ones which embed those,
and so on.

I'm pretty sure I got everything, but there's no change that would alert
the compiler if any topics in flight added new declarations. To catch
this case, we now double-check in the parsing function that things were
initialized as expected and BUG() if appropriate.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-12 08:38:59 -07:00
Jeff King
9a8c3c4a5f parse_object(): check commit-graph when skip_hash set
If the caller told us that they don't care about us checking the object
hash, then we're free to implement any optimizations that get us the
parsed value more quickly. An obvious one is to check the commit graph
before loading an object from disk. And in fact, both of the callers who
pass in this flag are already doing so before they call parse_object()!

So we can simplify those callers, as well as any possible future ones,
by moving the logic into parse_object().

There are two subtle things to note in the diff, but neither has any
impact in practice:

  - it seems least-surprising here to do the graph lookup on the
    git-replace'd oid, rather than the original. This is in theory a
    change of behavior from the earlier code, as neither caller did a
    replace lookup itself. But in practice it doesn't matter, as we
    disable the commit graph entirely if there are any replace refs.

  - the caller in get_reference() passes the skip_hash flag only if
    revs->verify_objects isn't set, whereas it would look in the commit
    graph unconditionally. In practice this should not matter as we
    should disable the commit graph entirely when using verify_objects
    (and that was done recently in another patch).

So this should be a pure cleanup with no behavior change.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-07 12:27:02 -07:00
Jeff King
0bc2557951 upload-pack: skip parse-object re-hashing of "want" objects
Imagine we have a history with commit C pointing to a large blob B.
If a client asks us for C, we can generally serve both objects to them
without accessing the uncompressed contents of B. In upload-pack, we
figure out which commits we have and what the client has, and feed those
tips to pack-objects. In pack-objects, we traverse the commits and trees
(or use bitmaps!) to find the set of objects needed, but we never open
up B. When we serve it to the client, we can often pass the compressed
bytes directly from the on-disk packfile over the wire.

But if a client asks us directly for B, perhaps because they are doing
an on-demand fetch to fill in the missing blob of a partial clone, we
end up much slower. Upload-pack calls parse_object() on the oid we
receive, which opens up the object and re-checks its hash (even though
if it were a commit, we might skip this parse entirely in favor of the
commit graph!). And then we feed the oid directly to pack-objects, which
again calls parse_object() and opens the object. And then finally, when
we write out the result, we may send bytes straight from disk, but only
after having unnecessarily uncompressed and computed the sha1 of the
object twice!

This patch teaches both code paths to use the new SKIP_HASH_CHECK flag
for parse_object(). You can see the speed-up in p5600, which does a
blob:none clone followed by a checkout. The savings for git.git are
modest:

  Test                          HEAD^             HEAD
  ----------------------------------------------------------------------
  5600.3: checkout of result    2.23(4.19+0.24)   1.72(3.79+0.18) -22.9%

But the savings scale with the number of bytes. So on a repository like
linux.git with more files, we see more improvement (in both absolute and
relative numbers):

  Test                          HEAD^                HEAD
  ----------------------------------------------------------------------------
  5600.3: checkout of result    51.62(77.26+2.76)    34.86(61.41+2.63) -32.5%

And here's an even more extreme case. This is the android gradle-plugin
repository, whose tip checkout has ~3.7GB of files:

  Test                          HEAD^               HEAD
  --------------------------------------------------------------------------
  5600.3: checkout of result    79.51(90.84+5.55)   40.28(51.88+5.67) -49.3%

Keep in mind that these timings are of the whole checkout operation. So
they count the client indexing the pack and actually writing out the
files. If we want to see just the server's view, we can hack up the
GIT_TRACE_PACKET output from those operations and replay it via
upload-pack. For the gradle example, that gives me:

  Benchmark 1: GIT_PROTOCOL=version=2 git.old upload-pack ../gradle-plugin <input
    Time (mean ± σ):     50.884 s ±  0.239 s    [User: 51.450 s, System: 1.726 s]
    Range (min … max):   50.608 s … 51.025 s    3 runs

  Benchmark 2: GIT_PROTOCOL=version=2 git.new upload-pack ../gradle-plugin <input
    Time (mean ± σ):      9.728 s ±  0.112 s    [User: 10.466 s, System: 1.535 s]
    Range (min … max):    9.618 s …  9.842 s    3 runs

  Summary
    'GIT_PROTOCOL=version=2 git.new upload-pack ../gradle-plugin <input' ran
      5.23 ± 0.07 times faster than 'GIT_PROTOCOL=version=2 git.old upload-pack ../gradle-plugin <input'

So a server would see an 80% reduction in CPU serving the initial
checkout of a partial clone for this repository. Or possibly even more
depending on the packing; most of the time spent in the faster one were
objects we had to open during the write phase.

In both cases skipping the extra hashing on the server should be pretty
safe. The client doesn't trust the server anyway, so it will re-hash all
of the objects via index-pack. There is one thing to note, though: the
change in get_reference() affects not just pack-objects, but rev-list,
git-log, etc. We could use a flag to limit to index-pack here, but we
may already skip hash checks in this instance. For commits, we'd skip
anything we load via the commit-graph. And while before this commit we
would check a blob fed directly to rev-list on the command-line, we'd
skip checking that same blob if we found it by traversing a tree.

The exception for both is if --verify-objects is used. In that case,
we'll skip this optimization, and the new test makes sure we do this
correctly.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-07 12:20:02 -07:00
Jeff King
b27ccae34b rev-list: disable commit graph with --verify-objects
Since the point of --verify-objects is to actually load and checksum the
bytes of each object, optimizing out reads using the commit graph runs
contrary to our goal.

The most targeted way to implement this would be for the revision
traversal code to check revs->verify_objects and avoid using the commit
graph. But it's difficult to be sure we've hit all of the correct spots.
For instance, I started this patch by writing the first of the included
test cases, where the corrupted commit is directly on rev-list's command
line. And that is easy to fix by teaching get_reference() to check
revs->verify_objects before calling lookup_commit_in_graph().

But that doesn't cover the second test case: when we traverse to a
corrupted commit, we'd parse the parent in process_parents(). So we'd
need to check there, too. And it keeps going. In handle_commit() we
sometimes parses commits, too, though I couldn't figure out a way to
trigger it that did not already parse via get_reference() or tag
peeling. And try_to_simplify_commit() has its own parse call, and so on.

So it seems like the safest thing is to just disable the commit graph
for the whole process when we see the --verify-objects option. We can do
that either in builtin/rev-list.c, where we use the option, or in
revision.c, where we parse it. There are some subtleties:

  - putting it in rev-list.c is less surprising in some ways, because
    there we know we are just doing a single traversal. In a command
    which does multiple traversals in a single process, it's rather
    unexpected to globally disable the commit graph.

  - putting it in revision.c is less surprising in some ways, because
    the caller does not have to remember to disable the graph
    themselves. But this is already tricky! The verify_objects flag in
    rev_info doesn't do anything by itself. The caller has to provide an
    object callback which does the right thing.

  - for that reason, in practice nobody but rev-list uses this option in
    the first place. So the distinction is probably not important either
    way. Arguably it should just be an option of rev-list, and not the
    general revision machinery; right now you can run "git log
    --verify-objects", but it does not actually do anything useful.

  - checking for a parsed revs.verify_objects flag in rev-list.c is too
    late. By that time we've already passed the arguments to
    setup_revisions(), which will have parsed the commits using the
    graph.

So this commit disables the graph as soon as we see the option in
revision.c. That's a pretty broad hammer, but it does what we want, and
in practice nobody but rev-list is using this flag anyway.

The tests cover both the "tip" and "parent" cases. Obviously our hammer
hits them both in this case, but it's good to check both in case
somebody later tries the more focused approach.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-07 09:44:30 -07:00
Ævar Arnfjörð Bjarmason
5cf88fd8b0 git-compat-util.h: use "UNUSED", not "UNUSED(var)"
As reported in [1] the "UNUSED(var)" macro introduced in
2174b8c75d (Merge branch 'jk/unused-annotation' into next,
2022-08-24) breaks coccinelle's parsing of our sources in files where
it occurs.

Let's instead partially go with the approach suggested in [2] of
making this not take an argument. As noted in [1] "coccinelle" will
ignore such tokens in argument lists that it doesn't know about, and
it's less of a surprise to syntax highlighters.

This undoes the "help us notice when a parameter marked as unused is
actually use" part of 9b24034754 (git-compat-util: add UNUSED macro,
2022-08-19), a subsequent commit will further tweak the macro to
implement a replacement for that functionality.

1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01 10:49:48 -07:00
Jeff King
02c3c59e62 hashmap: mark unused callback parameters
Hashmap comparison functions must conform to a particular callback
interface, but many don't use all of their parameters. Especially the
void cmp_data pointer, but some do not use keydata either (because they
can easily form a full struct to pass when doing lookups). Let's mark
these to make -Wunused-parameter happy.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19 12:18:55 -07:00
Jeff King
c006e9fa59 refs: mark unused reflog callback parameters
Functions used with for_each_reflog_ent() need to conform to a
particular interface, but not every function needs all of the
parameters. Mark the unused ones to make -Wunused-parameter happy.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19 12:18:54 -07:00
Jeff King
63e14ee2d6 refs: mark unused each_ref_fn parameters
Functions used with for_each_ref(), etc, need to conform to the
each_ref_fn interface. But most of them don't need every parameter;
let's annotate the unused ones to quiet -Wunused-parameter.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19 12:18:54 -07:00
Elijah Newren
257418c590 revision: allow --ancestry-path to take an argument
We have long allowed users to run e.g.
    git log --ancestry-path master..seen
which shows all commits which satisfy all three of these criteria:
  * are an ancestor of seen
  * are not an ancestor of master
  * have master as an ancestor

This commit allows another variant:
    git log --ancestry-path=$TOPIC master..seen
which shows all commits which satisfy all of these criteria:
  * are an ancestor of seen
  * are not an ancestor of master
  * have $TOPIC in their ancestry-path
that last bullet can be defined as commits meeting any of these
criteria:
    * are an ancestor of $TOPIC
    * have $TOPIC as an ancestor
    * are $TOPIC

This also allows multiple --ancestry-path arguments, which can be
used to find commits with any of the given topics in their ancestry
path.

Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19 10:45:08 -07:00
Junio C Hamano
83489a5b20 Merge branch 'ab/plug-revisions-leak'
Plug a bit more leaks in the revisions API.

* ab/plug-revisions-leak:
  revisions API: don't leak memory on argv elements that need free()-ing
  bisect.c: partially fix bisect_rev_setup() memory leak
  log: refactor "rev.pending" code in cmd_show()
  log: fix a memory leak in "git show <revision>..."
  test-fast-rebase helper: use release_revisions() (again)
  bisect.c: add missing "goto" for release_revisions()
2022-08-12 13:19:08 -07:00
Junio C Hamano
a6aeb2fef9 Merge branch 'jc/resolve-undo' into maint
The resolve-undo information in the index was not protected against
GC, which has been corrected.
source: <xmqq35f7kzad.fsf@gitster.g>

* jc/resolve-undo:
  fsck: do not dereference NULL while checking resolve-undo data
  revision: mark blobs needed for resolve-undo as reachable
2022-08-10 21:52:32 -07:00
Junio C Hamano
87098a047b Merge branch 'sa/cat-file-mailmap'
"git cat-file" learned an option to use the mailmap when showing
commit and tag objects.

* sa/cat-file-mailmap:
  cat-file: add mailmap support
  ident: rename commit_rewrite_person() to apply_mailmap_to_header()
  ident: move commit_rewrite_person() to ident.c
  revision: improve commit_rewrite_person()
2022-08-03 13:36:08 -07:00
Ævar Arnfjörð Bjarmason
f92dbdbc6a revisions API: don't leak memory on argv elements that need free()-ing
Add a "free_removed_argv_elements" member to "struct
setup_revision_opt", and use it to fix several memory leaks.

We have various memory leaks in APIs that take and munge "const
char **argv", e.g. parse_options(). Sometimes these APIs are given the
"argv" we get to the "main" function, in which case we don't leak
memory, but other times we're giving it the "v" member of a "struct
strvec" we created.

There's several potential ways to fix those sort of leaks, we could
add a "nodup" mode to "struct strvec", which would work for the cases
where we push constant strings to it. But that wouldn't work as soon
as we used strvec_pushf(), or otherwise needed to duplicate or create
a string for that "struct strvec".

Let's instead make it the responsibility of the revisions API. If it's
going to clobber elements of argv it can also free() them, which it
will now do if instructed to do so via "free_removed_argv_elements".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-03 11:12:36 -07:00
Junio C Hamano
418aef9055 Merge branch 'jc/resolve-undo'
The resolve-undo information in the index was not protected against
GC, which has been corrected.

* jc/resolve-undo:
  fsck: do not dereference NULL while checking resolve-undo data
  revision: mark blobs needed for resolve-undo as reachable
2022-07-19 16:40:16 -07:00
Siddharth Asthana
66a8a95315 ident: rename commit_rewrite_person() to apply_mailmap_to_header()
commit_rewrite_person() takes a commit buffer and replaces the idents
in the header with their canonical versions using the mailmap mechanism.
The name "commit_rewrite_person()" is misleading as it doesn't convey
what kind of rewrite are we going to do to the buffer. It also doesn't
clearly mention that the function will limit itself to the header part
of the buffer. The new name, "apply_mailmap_to_header()", expresses the
functionality of the function pretty clearly.

We intend to use apply_mailmap_to_header() in git-cat-file to replace
idents in the headers of commit and tag object buffers. So, we will be
extending this function to take tag objects buffer as well and replace
idents on the tagger header using the mailmap mechanism.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18 12:55:53 -07:00
Siddharth Asthana
dc88e349a2 ident: move commit_rewrite_person() to ident.c
commit_rewrite_person() and rewrite_ident_line() are static functions
defined in revision.c.

Their usages are as follows:
- commit_rewrite_person() takes a commit buffer and replaces the author
  and committer idents with their canonical versions using the mailmap
  mechanism
- rewrite_ident_line() takes author/committer header lines from the
  commit buffer and replaces the idents with their canonical versions
  using the mailmap mechanism.

This patch moves commit_rewrite_person() and rewrite_ident_line() to
ident.c which contains many other functions related to idents like
split_ident_line(). By moving commit_rewrite_person() to ident.c, we
also intend to use it in git-cat-file to replace committer and author
idents from the headers to their canonical versions using the mailmap
mechanism. The function is moved as is for now to make it clear that
there are no other changes, but it will be renamed in a following
commit.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18 12:55:53 -07:00
Siddharth Asthana
e9c1b0e38c revision: improve commit_rewrite_person()
The function, commit_rewrite_person(), is designed to find and replace
an ident string in the header part, and the way it avoids a random
occurrence of "author A U Thor <author@example.com" in the text is by
insisting "author" to appear at the beginning of line by passing
"\nauthor " as "what".

The implementation also doesn't make any effort to limit itself to the
commit header by locating the blank line that appears after the header
part and stopping the search there. Also, the interface forces the
caller to make multiple calls if it wants to rewrite idents on multiple
headers. It shouldn't be the case.

To support the existing caller better, update commit_rewrite_person()
to:
- Make a single pass in the input buffer to locate headers named
  "author" and "committer" and replace idents on them.
- Stop at the end of the header, ensuring that nothing in the body of
  the commit object is modified.

The return type of the function commit_rewrite_person() has also been
changed from int to void. This has been done because the caller of the
function doesn't do anything with the return value of the function.

By simplifying the interface of the commit_rewrite_person(), we also
intend to expose it as a public function. We will also be renaming the
function in a future commit to a different name which clearly tells that
the function replaces idents in the header of the commit buffer.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18 12:55:53 -07:00
Junio C Hamano
5a5ea141e7 revision: mark blobs needed for resolve-undo as reachable
The resolve-undo extension was added to the index in cfc5789a
(resolve-undo: record resolved conflicts in a new index extension
section, 2009-12-25).  This extension records the blob object names
and their modes of conflicted paths when the path gets resolved
(e.g. with "git add"), to allow "undoing" the resolution with
"checkout -m path".  These blob objects should be guarded from
garbage-collection while we have the resolve-undo information in the
index (otherwise unresolve operation may try to use a blob object
that has already been pruned away).

But the code called from mark_reachable_objects() for the index
forgets to do so.  Teach add_index_objects_to_pending() helper to
also add objects referred to by the resolve-undo extension.

Also make matching changes to "fsck", which has code that is fairly
similar to the reachability stuff, but have parallel implementations
for all these stuff, which may (or may not) someday want to be unified.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-09 16:45:07 -07:00
Junio C Hamano
2da81d1efb Merge branch 'ab/plug-leak-in-revisions'
Plug the memory leaks from the trickiest API of all, the revision
walker.

* ab/plug-leak-in-revisions: (27 commits)
  revisions API: add a TODO for diff_free(&revs->diffopt)
  revisions API: have release_revisions() release "topo_walk_info"
  revisions API: have release_revisions() release "date_mode"
  revisions API: call diff_free(&revs->pruning) in revisions_release()
  revisions API: release "reflog_info" in release revisions()
  revisions API: clear "boundary_commits" in release_revisions()
  revisions API: have release_revisions() release "prune_data"
  revisions API: have release_revisions() release "grep_filter"
  revisions API: have release_revisions() release "filter"
  revisions API: have release_revisions() release "cmdline"
  revisions API: have release_revisions() release "mailmap"
  revisions API: have release_revisions() release "commits"
  revisions API users: use release_revisions() for "prune_data" users
  revisions API users: use release_revisions() with UNLEAK()
  revisions API users: use release_revisions() in builtin/log.c
  revisions API users: use release_revisions() in http-push.c
  revisions API users: add "goto cleanup" for release_revisions()
  stash: always have the owner of "stash_info" free it
  revisions API users: use release_revisions() needing REV_INFO_INIT
  revision.[ch]: document and move code declared around "init"
  ...
2022-06-07 14:10:56 -07:00
Junio C Hamano
538dc459a0 Merge branch 'ep/maint-equals-null-cocci'
Introduce and apply coccinelle rule to discourage an explicit
comparison between a pointer and NULL, and applies the clean-up to
the maintenance track.

* ep/maint-equals-null-cocci:
  tree-wide: apply equals-null.cocci
  tree-wide: apply equals-null.cocci
  contrib/coccinnelle: add equals-null.cocci
2022-05-20 15:26:59 -07:00
Junio C Hamano
2b0a58d164 Merge branch 'ep/maint-equals-null-cocci' for maint-2.35
* ep/maint-equals-null-cocci:
  tree-wide: apply equals-null.cocci
  contrib/coccinnelle: add equals-null.cocci
2022-05-02 10:06:04 -07:00
Junio C Hamano
afe8a9070b tree-wide: apply equals-null.cocci
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-02 09:50:37 -07:00
Miklos Vajna
96697781e0 log: "--since-as-filter" option is a non-terminating "--since" variant
The "--since=<time>" option of "git log" limits the commits displayed by
the command by stopping the traversal once it sees a commit whose
timestamp is older than the given time and not digging further into its
parents.

This is OK in a history where a commit always has a newer timestamp than
any of its parents'.  Once you see a commit older than the given <time>,
all ancestor commits of it are even older than the time anyway.  It
poses, however, a problem when there is a commit with a wrong timestamp
that makes it appear older than its parents.  Stopping traversal at the
"incorrectly old" commit will hide its ancestors that are newer than
that wrong commit and are newer than the cut-off time given with the
--since option.  --max-age and --after being the synonyms to --since,
they share the same issue.

Add a new "--since-as-filter" option that is a variant of
"--since=<time>".  Instead of stopping the traversal to hide an old
enough commit and its all ancestors, exclude commits with an old
timestamp from the output but still keep digging the history.

Without other traversal stopping options, this will force the command in
"git log" family to dig down the history to the root.  It may be an
acceptable cost for a small project with short history and many commits
with screwy timestamps.

It is quite unlikely for us to add traversal stopper other than since,
so have this as a --since-as-filter option, rather than a separate
--as-filter, that would be probably more confusing.

Signed-off-by: Miklos Vajna <vmiklos@vmiklos.hu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-23 09:36:07 -07:00
Ævar Arnfjörð Bjarmason
54c8a7c379 revisions API: add a TODO for diff_free(&revs->diffopt)
Add a TODO comment indicating that we should release "diffopt" in
release_revisions(). In a preceding commit we started releasing the
"pruning" member of the same type, but handling "diffopt" will require
us to untangle the "no_free" conditions I added in e900d494dc (diff:
add an API for deferred freeing, 2021-02-11).

Let's leave a TODO comment to that effect, and so that we don't forget
refactor code that was changed to use release_revisions() in earlier
commits to stop using the "diffopt" member after a call to
release_revisions(). This works currently, but would become a logic
error as soon as we started freeing "diffopt". Doing that change now
doesn't harm anything, and future-proofs us against a later change to
release_revisions().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:10 -07:00
Ævar Arnfjörð Bjarmason
ae1b383dfa revisions API: have release_revisions() release "topo_walk_info"
Refactor the existing reset_topo_walk() into a thin wrapper for a
release_revisions_topo_walk_info() + resetting the member to "NULL",
and call release_revisions_topo_walk_info() from release_revisions().

This fixes memory leaks that have been with us ever since
"topo_walk_info" was added to revision.[ch] in
f0d9cc4196 (revision.c: begin refactoring --topo-order logic,
2018-11-01).

Due to various other leaks this makes no tests pass in their entirety,
but e.g. before this running this on git.git:

    ./git -P log --pretty=tformat:"%P   %H | %s" --parents --full-history --topo-order -3 -- README.md

Would report under SANITIZE=leak:

    SUMMARY: LeakSanitizer: 531064 byte(s) leaked in 6 allocation(s).

Now we'll free all of that memory.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:10 -07:00
Ævar Arnfjörð Bjarmason
9d5a7df332 revisions API: have release_revisions() release "date_mode"
Extend the the release_revisions() function so that it frees the
"date_mode" in the "struct ref_info".

This uses the date_mode_release() function added in 974c919d36 (date
API: add and use a date_mode_release(), 2022-02-16). As that commit
notes "t7004-tag.sh" tests for the leaks that are being fixed
here. That test now fails "only" 44 tests, instead of the 46 it failed
before this change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:10 -07:00
Ævar Arnfjörð Bjarmason
6ab75ac839 revisions API: call diff_free(&revs->pruning) in revisions_release()
Call diff_free() on the "pruning" member of "struct rev_info".  Doing
so makes several tests pass under SANITIZE=leak.

This was also the last missing piece that allows us to remove the
UNLEAK() in "cmd_diff" and "cmd_diff_index", which allows us to use
those commands as a canary for general leaks in the revisions API. See
[1] for further rationale, and 886e1084d7 (builtin/: add UNLEAKs,
2017-10-01) for the commit that added the UNLEAK() there.

1. https://lore.kernel.org/git/220218.861r00ib86.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:10 -07:00
Ævar Arnfjörð Bjarmason
81ffbf8380 revisions API: release "reflog_info" in release revisions()
Add a missing reflog_walk_info_release() to "reflog-walk.c" and use it
in release_revisions().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:10 -07:00
Ævar Arnfjörð Bjarmason
ab1f6926e9 revisions API: clear "boundary_commits" in release_revisions()
Clear the "boundary_commits" object_array in release_revisions(). This
makes a few more tests pass under SANITIZE=leak, including
"t/t4126-apply-empty.sh" which started failed as an UNLEAK() in
cmd_format_patch() was removed in a preceding commit.

This also re-marks the various tests relying on "git format-patch" as
passing under "SANITIZE=leak", in the preceding "revisions API users:
use release_revisions() in builtin/log.c" commit those were marked as
failing as we removed the UNLEAK(rev) from cmd_format_patch() in
"builtin/log.c".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason
689a8e80dd revisions API: have release_revisions() release "prune_data"
Extend the the release_revisions() function so that it frees the
"prune_data" in the "struct rev_info". This means that any code that
calls "release_revisions()" already can get rid of adjacent calls to
clear_pathspec().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason
f41fb662f5 revisions API: have release_revisions() release "grep_filter"
Extend the the release_revisions() function so that it frees the
"grep_filter" in the "struct rev_info".This allows us to mark a test
as passing under "TEST_PASSES_SANITIZE_LEAK=true".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason
e75d2f7f73 revisions API: have release_revisions() release "filter"
Extend the the release_revisions() function so that it frees the
"filter" in the "struct rev_info". This in combination with a
preceding change to free "cmdline" means that we can mark another set
of tests as passing under "TEST_PASSES_SANITIZE_LEAK=true".

The "filter" member was added recently in ffaa137f64 (revision: put
object filter into struct rev_info, 2022-03-09), and this fixes leaks
intruded in the subsequent leak 7940941de1 (pack-objects: use
rev.filter when possible, 2022-03-09) and 105c6f14ad (bundle: parse
filter capability, 2022-03-09).

The "builtin/pack-objects.c" leak in 7940941de1 was effectively with
us already, but the variable was referred to by a "static" file-scoped
variable. The "bundle.c " leak in 105c6f14ad was newly introduced
with the new "filter" feature for bundles.

The "t5600-clone-fail-cleanup.sh" change here to add
"TEST_PASSES_SANITIZE_LEAK=true" is one of the cases where
run-command.c in not carrying the abort() exit code upwards would have
had that test passing before, but now it *actually* passes[1]. We
should fix the lack of 1=1 mapping of SANITIZE=leak testing to actual
leaks some other time, but it's an existing edge case, let's just mark
the really-passing test as passing for now.

1. https://lore.kernel.org/git/220303.86fsnz5o9w.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason
7a98d9ab00 revisions API: have release_revisions() release "cmdline"
Extend the the release_revisions() function so that it frees the
"cmdline" in the "struct rev_info". This in combination with a
preceding change to free "commits" and "mailmap" means that we can
whitelist another test under "TEST_PASSES_SANITIZE_LEAK=true".

There was a proposal in [1] to do away with xstrdup()-ing this
add_rev_cmdline(), perhaps that would be worthwhile, but for now let's
just free() it.

We could also make that a "char *" in "struct rev_cmdline_entry"
itself, but since we own it let's expose it as a constant to outside
callers. I proposed that in [2] but have since changed my mind. See
14d30cdfc0 (ref-filter: fix memory leak in `free_array_item()`,
2019-07-10), c514c62a4f (checkout: fix leak of non-existent branch
names, 2020-08-14) and other log history hits for "free((char *)" for
prior art.

This includes the tests we had false-positive passes on before my
6798b08e84 (perl Git.pm: don't ignore signalled failure in
_cmd_close(), 2022-02-01), now they pass for real.

Since there are 66 tests matching t/t[0-9]*git-svn*.sh it's easier to
list those that don't pass than to touch most of those 66. So let's
introduce a "TEST_FAILS_SANITIZE_LEAK=true", which if set in the tests
won't cause lib-git-svn.sh to set "TEST_PASSES_SANITIZE_LEAK=true.

This change also marks all the tests that we removed
"TEST_FAILS_SANITIZE_LEAK=true" from in an earlier commit due to
removing the UNLEAK() from cmd_format_patch(), we can now assert that
its API use doesn't leak any "struct rev_info" memory.

This change also made commit "t5503-tagfollow.sh" pass on current
master, but that would regress when combined with
ps/fetch-atomic-fixup's de004e848a (t5503: simplify setup of test
which exercises failure of backfill, 2022-03-03) (through no fault of
that topic, that change started using "git clone" in the test, which
has an outstanding leak). Let's leave that test out for now to avoid
in-flight semantic conflicts.

1. https://lore.kernel.org/git/YUj%2FgFRh6pwrZalY@carlos-mbp.lan/
2. https://lore.kernel.org/git/87o88obkb1.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason
a52f07afcb revisions API: have release_revisions() release "mailmap"
Extend the the release_revisions() function so that it frees the
"mailmap" in the "struct rev_info".

The log family of functions now calls the clear_mailmap() function
added in fa8afd18e5a (revisions API: provide and use a
release_revisions(), 2021-09-19), allowing us to whitelist some tests
with "TEST_PASSES_SANITIZE_LEAK=true".

Unfortunately having a pointer to a mailmap in "struct rev_info"
instead of an embedded member that we "own" get a bit messy, as can be
seen in the change to builtin/commit.c.

When we free() this data we won't be able to tell apart a pointer to a
"mailmap" on the heap from one on the stack. As seen in
ea57bc0d41 (log: add --use-mailmap option, 2013-01-05) the "log"
family allocates it on the heap, but in the find_author_by_nickname()
code added in ea16794e43 (commit: search author pattern against
mailmap, 2013-08-23) we allocated it on the stack instead.

Ideally we'd simply change that member to a "struct string_list
mailmap" and never free() the "mailmap" itself, but that would be a
much larger change to the revisions API.

We have code that needs to hand an existing "mailmap" to a "struct
rev_info", while we could change all of that, let's not go there
now.

The complexity isn't in the ownership of the "mailmap" per-se, but
that various things assume a "rev_info.mailmap == NULL" means "doesn't
want mailmap", if we changed that to an init'd "struct string_list
we'd need to carefully refactor things to change those assumptions.

Let's instead always free() it, and simply declare that if you add
such a "mailmap" it must be allocated on the heap. Any modern libc
will correctly panic if we free() a stack variable, so this should be
safe going forward.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason
e966fc5a89 revisions API: have release_revisions() release "commits"
Extend the the release_revisions() function so that it frees the
"commits" in the "struct rev_info".

We don't expect to use this "struct rev_info" again, so there's no
reason to NULL out revs->commits, as e.g. simplify_merges() and
create_boundary_commit_list() do.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason
296a143845 revision.[ch]: document and move code declared around "init"
A subsequent commit will add "REV_INFO_INIT" macro adjacent to
repo_init_revisions(), unfortunately between the "struct rev_info"
itself and that function we've added various miscellaneous code
between the two over the years.

Let's move that code either lower in revision.h, giving it API docs
while we're at it, or in cases where it wasn't public API at all move
it into revision.c No lines of code are changed here, only moved
around. The only changes are the addition of new API comments.

The "tree_difference" variable could also be declared like this, which
I think would be a lot clearer, but let's leave that for now to keep
this a move-only change:

	static enum {
		REV_TREE_SAME,
		REV_TREE_NEW, /* Only new files */
		REV_TREE_OLD, /* Only files removed */
		REV_TREE_DIFFERENT, /* Mixed changes */
	} tree_difference = REV_TREE_SAME;

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:08 -07:00
Ævar Arnfjörð Bjarmason
1878b5edc0 revision.[ch]: provide and start using a release_revisions()
The users of the revision.[ch] API's "struct rev_info" are a major
source of memory leaks in the test suite under SANITIZE=leak, which in
turn adds a lot of noise when trying to mark up tests with
"TEST_PASSES_SANITIZE_LEAK=true".

The users of that API are largely one-shot, e.g. "git rev-list" or
"git log", or the "git checkout" and "git stash" being modified here

For these callers freeing the memory is arguably a waste of time, but
in many cases they've actually been trying to free the memory, and
just doing that in a buggy manner.

Let's provide a release_revisions() function for these users, and
start migrating them over per the plan outlined in [1]. Right now this
only handles the "pending" member of the struct, but more will be
added in subsequent commits.

Even though we only clear the "pending" member now, let's not leave a
trap in code like the pre-image of index_differs_from(), where we'd
start doing the wrong thing as soon as the release_revisions() learned
to clear its "diffopt". I.e. we need to call release_revisions() after
we've inspected any state in "struct rev_info".

This leaves in place e.g. clear_pathspec(&rev.prune_data) in
stash_working_tree() in builtin/stash.c, subsequent commits will teach
release_revisions() to free "prune_data" and other members that in
some cases are individually cleared by users of "struct rev_info" by
reaching into its members. Those subsequent commits will remove the
relevant calls to e.g. clear_pathspec().

We avoid amending code in index_differs_from() in diff-lib.c as well
as wt_status_collect_changes_index(), has_unstaged_changes() and
has_uncommitted_changes() in wt-status.c in a way that assumes that we
are already clearing the "diffopt" member. That will be handled in a
subsequent commit.

1. https://lore.kernel.org/git/87a6k8daeu.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:08 -07:00
Ævar Arnfjörð Bjarmason
bf20fe4ca8 cocci: add and apply free_commit_list() rules
Add and apply coccinelle rules to remove "if (E)" before
"free_commit_list(E)", the function can accept NULL, and further
change cases where "E = NULL" followed to also be unconditionally.

The code changes in this commit were entirely made by the coccinelle
rule being added here, and applied with:

    make contrib/coccinelle/free.cocci.patch
    patch -p1 <contrib/coccinelle/free.cocci.patch

The only manual intervention here is that the the relevant code in
commit.c has been manually re-indented.

Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:08 -07:00
Derrick Stolee
cc91044256 list-objects-filter: remove CL_ARG__FILTER
We have established the command-line interface for the --[no-]filter
options for a while now, so we do not need a helper to make this
editable in the future.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-23 13:13:17 -07:00
Junio C Hamano
7391ecd338 Merge branch 'ds/partial-bundles'
Bundle file format gets extended to allow a partial bundle,
filtered by similar criteria you would give when making a
partial/lazy clone.

* ds/partial-bundles:
  clone: fail gracefully when cloning filtered bundle
  bundle: unbundle promisor packs
  bundle: create filtered bundles
  rev-list: move --filter parsing into revision.c
  bundle: parse filter capability
  list-objects: handle NULL function pointers
  MyFirstObjectWalk: update recommended usage
  list-objects: consolidate traverse_commit_list[_filtered]
  pack-bitmap: drop filter in prepare_bitmap_walk()
  pack-objects: use rev.filter when possible
  revision: put object filter into struct rev_info
  list-objects-filter-options: create copy helper
  index-pack: document and test the --promisor option
2022-03-21 15:14:24 -07:00
Derrick Stolee
c4ea513f4a rev-list: move --filter parsing into revision.c
Now that 'struct rev_info' has a 'filter' member and most consumers of
object filtering are using that member instead of an external struct,
move the parsing of the '--filter' option out of builtin/rev-list.c and
into revision.c.

This use within handle_revision_pseudo_opt() allows us to find the
option within setup_revisions() if the arguments are passed directly. In
the case of a command such as 'git blame', the arguments are first
scanned and checked with parse_revision_opt(), which complains about the
option, so 'git blame --filter=blob:none <file>' does not become valid
with this change.

Some commands, such as 'git diff' gain this option without having it
make an effect. And 'git diff --objects' was already possible, but does
not actually make sense in that builtin.

The key addition that is coming is 'git bundle create --filter=<X>' so
we can create bundles containing promisor packs. More work is required
to make them fully functional, but that will follow.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-09 10:25:27 -08:00
Junio C Hamano
5b84280c65 Merge branch 'ab/grep-patterntype'
Some code clean-up in the "git grep" machinery.

* ab/grep-patterntype:
  grep: simplify config parsing and option parsing
  grep.c: do "if (bool && memchr())" not "if (memchr() && bool)"
  grep.h: make "grep_opt.pattern_type_option" use its enum
  grep API: call grep_config() after grep_init()
  grep.c: don't pass along NULL callback value
  built-ins: trust the "prefix" from run_builtin()
  grep tests: add missing "grep.patternType" config tests
  grep tests: create a helper function for "BRE" or "ERE"
  log tests: check if grep_config() is called by "log"-like cmds
  grep.h: remove unused "regex_t regexp" from grep_opt
2022-02-25 15:47:36 -08:00
Junio C Hamano
8813596531 Merge branch 'ah/log-no-graph'
"git log --graph --graph" used to leak a graph structure, and there
was no way to countermand "--graph" that appear earlier on the
command line.  A "--no-graph" option has been added and resource
leakage has been plugged.

* ah/log-no-graph:
  log: add a --no-graph option
  log: fix memory leak if --graph is passed multiple times
2022-02-23 16:58:03 -08:00
Ævar Arnfjörð Bjarmason
04bf052eef grep: simplify config parsing and option parsing
Simplify the parsing of "grep.patternType" and
"grep.extendedRegexp". This changes no behavior, but gets rid of
complex parsing logic that isn't needed anymore.

When "grep.patternType" was introduced in 84befcd0a4 (grep: add a
grep.patternType configuration setting, 2012-08-03) we promised that:

 1. You can set "grep.patternType", and "[setting it to] 'default'
    will return to the default matching behavior".

    In that context "the default" meant whatever the configuration
    system specified before that change, i.e. via grep.extendedRegexp.

 2. We'd support the existing "grep.extendedRegexp" option, but ignore
    it when the new "grep.patternType" option is set. We said we'd
    only ignore the older "grep.extendedRegexp" option "when the
    `grep.patternType` option is set to a value other than
    'default'".

In a preceding commit we changed grep_config() to be called after
grep_init(), which means that much of the complexity here can go
away.

As before both "grep.patternType" and "grep.extendedRegexp" are
last-one-wins variable, with "grep.extendedRegexp" yielding to
"grep.patternType", except when "grep.patternType=default".

Note that as the previously added tests indicate this cannot be done
on-the-fly as we see the config variables, without introducing more
state keeping. I.e. if we see:

    -c grep.extendedRegexp=false
    -c grep.patternType=default
    -c extendedRegexp=true

We need to select ERE, since grep.patternType=default unselects that
variable, which normally has higher precedence, but we also need to
select BRE in cases of:

    -c grep.extendedRegexp=true \
    -c grep.extendedRegexp=false

Which would not be the case for this, which select ERE:

    -c grep.patternType=extended \
    -c grep.extendedRegexp=false

Therefore we cannot do this on-the-fly in grep_config without also
introducing tracking variables for not only the pattern type, but what
the source of that pattern type was.

So we need to decide on the pattern after our config was fully
parsed. Let's do that by deferring the decision on the pattern type
until it's time to compile it in compile_regexp().

By that time we've not only parsed the config, but also handled the
command-line options. Those will set "opt.pattern_type_option" (*not*
"opt.extended_regexp_option"!).

At that point all we need to do is see if "grep.patternType" was
UNSPECIFIED in the end (including an explicit "=default"), if so we'll
use the "grep.extendedRegexp" configuration, if any.

See my 07a3d41173 (grep: remove regflags from the public grep_opt
API, 2017-06-29) for addition of the two comments being removed here,
i.e. the complexity noted in that commit is now going away.

1. https://lore.kernel.org/git/patch-v8-09.10-c211bb0c69d-20220118T155211Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-15 18:00:50 -08:00
Ævar Arnfjörð Bjarmason
9725c8dda2 built-ins: trust the "prefix" from run_builtin()
Change code in "builtin/grep.c" and "builtin/ls-tree.c" to trust the
"prefix" passed from "run_builtin()". The "prefix" we get from setup.c
is either going to be NULL or a string of length >0, never "".

So we can drop the "prefix && *prefix" checks added for
"builtin/grep.c" in 0d042fecf2 (git-grep: show pathnames relative to
the current directory, 2006-08-11), and for "builtin/ls-tree.c" in
a69dd585fc (ls-tree: chomp leading directories when run from a
subdirectory, 2005-12-23).

As seen in code in revision.c that was added in cd676a5136 (diff
--relative: output paths as relative to the current subdirectory,
2008-02-12) we already have existing code that does away with this
assertion.

This makes it easier to reason about a subsequent change to the
"prefix_length" code in grep.c in a subsequent commit, and since we're
going to the trouble of doing that let's leave behind an assert() to
promise this to any future callers.

For "builtin/grep.c" it would be painful to pass the "prefix" down the
callchain of:

    cmd_grep -> grep_tree -> grep_submodule -> grep_cache -> grep_oid ->
    grep_source_name

So for the code that needs it in grep_source_name() let's add a
"grep_prefix" variable similar to the existing "ls_tree_prefix".

While at it let's move the code in cmd_ls_tree() around so that we
assign to the "ls_tree_prefix" right after declaring the variables,
and stop assigning to "prefix". We only subsequently used that
variable later in the function after clobbering it. Let's just use our
own "grep_prefix" instead.

Let's also add an assert() in git.c, so that we'll make this promise
about the "prefix" to any current and future callers, as well as to
any readers of the code.

Code history:

 * The strlen() in "grep.c" hasn't been used since 493b7a08d8 (grep:
   accept relative paths outside current working directory, 2009-09-05).

   When that code was added in 0d042fecf2 (git-grep: show pathnames
   relative to the current directory, 2006-08-11) we used the length.

   But since 493b7a08d8 we haven't used it for anything except a
   boolean check that we could have done on the "prefix" member
   itself.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-15 18:00:50 -08:00
Alex Henrie
087c745833 log: add a --no-graph option
It's useful to be able to countermand a previous --graph option, for
example if `git log --graph` is run via an alias.

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-11 10:06:41 -08:00
Alex Henrie
dccf6c16f1 log: fix memory leak if --graph is passed multiple times
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-11 10:06:40 -08:00
Jerry Zhang
9d505b7b49 git-rev-list: add --exclude-first-parent-only flag
It is useful to know when a branch first diverged in history
from some integration branch in order to be able to enumerate
the user's local changes. However, these local changes can
include arbitrary merges, so it is necessary to ignore this
merge structure when finding the divergence point.

In order to do this, teach the "rev-list" family to accept
"--exclude-first-parent-only", which restricts the traversal
of excluded commits to only follow first parent links.

   -A-----E-F-G--main
     \   / /
      B-C-D--topic

In this example, the goal is to return the set {B, C, D} which
represents a topic branch that has been merged into main branch.
`git rev-list topic ^main` will end up returning no commits
since excluding main will end up traversing the commits on topic
as well. `git rev-list --exclude-first-parent-only topic ^main`
however will return {B, C, D} as desired.

Add docs for the new flag, and clarify the doc for --first-parent
to indicate that it applies to traversing the set of included
commits only.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-12 11:08:42 -08:00
Junio C Hamano
c17de5a505 Merge branch 'ja/i18n-similar-messages'
Similar message templates have been consolidated so that
translators need to work on fewer number of messages.

* ja/i18n-similar-messages:
  i18n: turn even more messages into "cannot be used together" ones
  i18n: ref-filter: factorize "%(foo) atom used without %(bar) atom"
  i18n: factorize "--foo outside a repository"
  i18n: refactor "unrecognized %(foo) argument" strings
  i18n: factorize "no directory given for --foo"
  i18n: factorize "--foo requires --bar" and the like
  i18n: tag.c factorize i18n strings
  i18n: standardize "cannot open" and "cannot read"
  i18n: turn "options are incompatible" into "cannot be used together"
  i18n: refactor "%s, %s and %s are mutually exclusive"
  i18n: refactor "foo and bar are mutually exclusive"
2022-01-10 11:52:56 -08:00
Junio C Hamano
2043ce828e Merge branch 'rs/log-invert-grep-with-headers'
"git log --invert-grep --author=<name>" used to exclude commits
written by the given author, but now "--invert-grep" only affects
the matches made by the "--grep=<pattern>" option.

* rs/log-invert-grep-with-headers:
  log: let --invert-grep only invert --grep
2022-01-05 14:01:30 -08:00
Jean-Noël Avila
6fa00ee843 i18n: factorize "--foo requires --bar" and the like
They are all replaced by "the option '%s' requires '%s'", which is a
new string but replaces 17 previous unique strings.

Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-05 13:31:00 -08:00
Jean-Noël Avila
12909b6b8a i18n: turn "options are incompatible" into "cannot be used together"
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-05 13:29:23 -08:00
Junio C Hamano
5a4069a1d8 Merge branch 'jc/c99-var-decl-in-for-loop'
Weather balloon to find compilers that do not grok variable
declaration in the for() loop.

* jc/c99-var-decl-in-for-loop:
  revision: use C99 declaration of variable in for() loop
2021-12-21 15:03:15 -08:00