Commit graph

912 commits

Author SHA1 Message Date
Junio C Hamano 418aef9055 Merge branch 'jc/resolve-undo'
The resolve-undo information in the index was not protected against
GC, which has been corrected.

* jc/resolve-undo:
  fsck: do not dereference NULL while checking resolve-undo data
  revision: mark blobs needed for resolve-undo as reachable
2022-07-19 16:40:16 -07:00
Junio C Hamano 5a5ea141e7 revision: mark blobs needed for resolve-undo as reachable
The resolve-undo extension was added to the index in cfc5789a
(resolve-undo: record resolved conflicts in a new index extension
section, 2009-12-25).  This extension records the blob object names
and their modes of conflicted paths when the path gets resolved
(e.g. with "git add"), to allow "undoing" the resolution with
"checkout -m path".  These blob objects should be guarded from
garbage-collection while we have the resolve-undo information in the
index (otherwise unresolve operation may try to use a blob object
that has already been pruned away).

But the code called from mark_reachable_objects() for the index
forgets to do so.  Teach add_index_objects_to_pending() helper to
also add objects referred to by the resolve-undo extension.

Also make matching changes to "fsck", which has code that is fairly
similar to the reachability stuff, but have parallel implementations
for all these stuff, which may (or may not) someday want to be unified.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-09 16:45:07 -07:00
Junio C Hamano 2da81d1efb Merge branch 'ab/plug-leak-in-revisions'
Plug the memory leaks from the trickiest API of all, the revision
walker.

* ab/plug-leak-in-revisions: (27 commits)
  revisions API: add a TODO for diff_free(&revs->diffopt)
  revisions API: have release_revisions() release "topo_walk_info"
  revisions API: have release_revisions() release "date_mode"
  revisions API: call diff_free(&revs->pruning) in revisions_release()
  revisions API: release "reflog_info" in release revisions()
  revisions API: clear "boundary_commits" in release_revisions()
  revisions API: have release_revisions() release "prune_data"
  revisions API: have release_revisions() release "grep_filter"
  revisions API: have release_revisions() release "filter"
  revisions API: have release_revisions() release "cmdline"
  revisions API: have release_revisions() release "mailmap"
  revisions API: have release_revisions() release "commits"
  revisions API users: use release_revisions() for "prune_data" users
  revisions API users: use release_revisions() with UNLEAK()
  revisions API users: use release_revisions() in builtin/log.c
  revisions API users: use release_revisions() in http-push.c
  revisions API users: add "goto cleanup" for release_revisions()
  stash: always have the owner of "stash_info" free it
  revisions API users: use release_revisions() needing REV_INFO_INIT
  revision.[ch]: document and move code declared around "init"
  ...
2022-06-07 14:10:56 -07:00
Junio C Hamano 538dc459a0 Merge branch 'ep/maint-equals-null-cocci'
Introduce and apply coccinelle rule to discourage an explicit
comparison between a pointer and NULL, and applies the clean-up to
the maintenance track.

* ep/maint-equals-null-cocci:
  tree-wide: apply equals-null.cocci
  tree-wide: apply equals-null.cocci
  contrib/coccinnelle: add equals-null.cocci
2022-05-20 15:26:59 -07:00
Junio C Hamano 2b0a58d164 Merge branch 'ep/maint-equals-null-cocci' for maint-2.35
* ep/maint-equals-null-cocci:
  tree-wide: apply equals-null.cocci
  contrib/coccinnelle: add equals-null.cocci
2022-05-02 10:06:04 -07:00
Junio C Hamano afe8a9070b tree-wide: apply equals-null.cocci
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-02 09:50:37 -07:00
Miklos Vajna 96697781e0 log: "--since-as-filter" option is a non-terminating "--since" variant
The "--since=<time>" option of "git log" limits the commits displayed by
the command by stopping the traversal once it sees a commit whose
timestamp is older than the given time and not digging further into its
parents.

This is OK in a history where a commit always has a newer timestamp than
any of its parents'.  Once you see a commit older than the given <time>,
all ancestor commits of it are even older than the time anyway.  It
poses, however, a problem when there is a commit with a wrong timestamp
that makes it appear older than its parents.  Stopping traversal at the
"incorrectly old" commit will hide its ancestors that are newer than
that wrong commit and are newer than the cut-off time given with the
--since option.  --max-age and --after being the synonyms to --since,
they share the same issue.

Add a new "--since-as-filter" option that is a variant of
"--since=<time>".  Instead of stopping the traversal to hide an old
enough commit and its all ancestors, exclude commits with an old
timestamp from the output but still keep digging the history.

Without other traversal stopping options, this will force the command in
"git log" family to dig down the history to the root.  It may be an
acceptable cost for a small project with short history and many commits
with screwy timestamps.

It is quite unlikely for us to add traversal stopper other than since,
so have this as a --since-as-filter option, rather than a separate
--as-filter, that would be probably more confusing.

Signed-off-by: Miklos Vajna <vmiklos@vmiklos.hu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-23 09:36:07 -07:00
Ævar Arnfjörð Bjarmason 54c8a7c379 revisions API: add a TODO for diff_free(&revs->diffopt)
Add a TODO comment indicating that we should release "diffopt" in
release_revisions(). In a preceding commit we started releasing the
"pruning" member of the same type, but handling "diffopt" will require
us to untangle the "no_free" conditions I added in e900d494dc (diff:
add an API for deferred freeing, 2021-02-11).

Let's leave a TODO comment to that effect, and so that we don't forget
refactor code that was changed to use release_revisions() in earlier
commits to stop using the "diffopt" member after a call to
release_revisions(). This works currently, but would become a logic
error as soon as we started freeing "diffopt". Doing that change now
doesn't harm anything, and future-proofs us against a later change to
release_revisions().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:10 -07:00
Ævar Arnfjörð Bjarmason ae1b383dfa revisions API: have release_revisions() release "topo_walk_info"
Refactor the existing reset_topo_walk() into a thin wrapper for a
release_revisions_topo_walk_info() + resetting the member to "NULL",
and call release_revisions_topo_walk_info() from release_revisions().

This fixes memory leaks that have been with us ever since
"topo_walk_info" was added to revision.[ch] in
f0d9cc4196 (revision.c: begin refactoring --topo-order logic,
2018-11-01).

Due to various other leaks this makes no tests pass in their entirety,
but e.g. before this running this on git.git:

    ./git -P log --pretty=tformat:"%P   %H | %s" --parents --full-history --topo-order -3 -- README.md

Would report under SANITIZE=leak:

    SUMMARY: LeakSanitizer: 531064 byte(s) leaked in 6 allocation(s).

Now we'll free all of that memory.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:10 -07:00
Ævar Arnfjörð Bjarmason 9d5a7df332 revisions API: have release_revisions() release "date_mode"
Extend the the release_revisions() function so that it frees the
"date_mode" in the "struct ref_info".

This uses the date_mode_release() function added in 974c919d36 (date
API: add and use a date_mode_release(), 2022-02-16). As that commit
notes "t7004-tag.sh" tests for the leaks that are being fixed
here. That test now fails "only" 44 tests, instead of the 46 it failed
before this change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:10 -07:00
Ævar Arnfjörð Bjarmason 6ab75ac839 revisions API: call diff_free(&revs->pruning) in revisions_release()
Call diff_free() on the "pruning" member of "struct rev_info".  Doing
so makes several tests pass under SANITIZE=leak.

This was also the last missing piece that allows us to remove the
UNLEAK() in "cmd_diff" and "cmd_diff_index", which allows us to use
those commands as a canary for general leaks in the revisions API. See
[1] for further rationale, and 886e1084d7 (builtin/: add UNLEAKs,
2017-10-01) for the commit that added the UNLEAK() there.

1. https://lore.kernel.org/git/220218.861r00ib86.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:10 -07:00
Ævar Arnfjörð Bjarmason 81ffbf8380 revisions API: release "reflog_info" in release revisions()
Add a missing reflog_walk_info_release() to "reflog-walk.c" and use it
in release_revisions().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:10 -07:00
Ævar Arnfjörð Bjarmason ab1f6926e9 revisions API: clear "boundary_commits" in release_revisions()
Clear the "boundary_commits" object_array in release_revisions(). This
makes a few more tests pass under SANITIZE=leak, including
"t/t4126-apply-empty.sh" which started failed as an UNLEAK() in
cmd_format_patch() was removed in a preceding commit.

This also re-marks the various tests relying on "git format-patch" as
passing under "SANITIZE=leak", in the preceding "revisions API users:
use release_revisions() in builtin/log.c" commit those were marked as
failing as we removed the UNLEAK(rev) from cmd_format_patch() in
"builtin/log.c".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason 689a8e80dd revisions API: have release_revisions() release "prune_data"
Extend the the release_revisions() function so that it frees the
"prune_data" in the "struct rev_info". This means that any code that
calls "release_revisions()" already can get rid of adjacent calls to
clear_pathspec().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason f41fb662f5 revisions API: have release_revisions() release "grep_filter"
Extend the the release_revisions() function so that it frees the
"grep_filter" in the "struct rev_info".This allows us to mark a test
as passing under "TEST_PASSES_SANITIZE_LEAK=true".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason e75d2f7f73 revisions API: have release_revisions() release "filter"
Extend the the release_revisions() function so that it frees the
"filter" in the "struct rev_info". This in combination with a
preceding change to free "cmdline" means that we can mark another set
of tests as passing under "TEST_PASSES_SANITIZE_LEAK=true".

The "filter" member was added recently in ffaa137f64 (revision: put
object filter into struct rev_info, 2022-03-09), and this fixes leaks
intruded in the subsequent leak 7940941de1 (pack-objects: use
rev.filter when possible, 2022-03-09) and 105c6f14ad (bundle: parse
filter capability, 2022-03-09).

The "builtin/pack-objects.c" leak in 7940941de1 was effectively with
us already, but the variable was referred to by a "static" file-scoped
variable. The "bundle.c " leak in 105c6f14ad was newly introduced
with the new "filter" feature for bundles.

The "t5600-clone-fail-cleanup.sh" change here to add
"TEST_PASSES_SANITIZE_LEAK=true" is one of the cases where
run-command.c in not carrying the abort() exit code upwards would have
had that test passing before, but now it *actually* passes[1]. We
should fix the lack of 1=1 mapping of SANITIZE=leak testing to actual
leaks some other time, but it's an existing edge case, let's just mark
the really-passing test as passing for now.

1. https://lore.kernel.org/git/220303.86fsnz5o9w.gmgdl@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason 7a98d9ab00 revisions API: have release_revisions() release "cmdline"
Extend the the release_revisions() function so that it frees the
"cmdline" in the "struct rev_info". This in combination with a
preceding change to free "commits" and "mailmap" means that we can
whitelist another test under "TEST_PASSES_SANITIZE_LEAK=true".

There was a proposal in [1] to do away with xstrdup()-ing this
add_rev_cmdline(), perhaps that would be worthwhile, but for now let's
just free() it.

We could also make that a "char *" in "struct rev_cmdline_entry"
itself, but since we own it let's expose it as a constant to outside
callers. I proposed that in [2] but have since changed my mind. See
14d30cdfc0 (ref-filter: fix memory leak in `free_array_item()`,
2019-07-10), c514c62a4f (checkout: fix leak of non-existent branch
names, 2020-08-14) and other log history hits for "free((char *)" for
prior art.

This includes the tests we had false-positive passes on before my
6798b08e84 (perl Git.pm: don't ignore signalled failure in
_cmd_close(), 2022-02-01), now they pass for real.

Since there are 66 tests matching t/t[0-9]*git-svn*.sh it's easier to
list those that don't pass than to touch most of those 66. So let's
introduce a "TEST_FAILS_SANITIZE_LEAK=true", which if set in the tests
won't cause lib-git-svn.sh to set "TEST_PASSES_SANITIZE_LEAK=true.

This change also marks all the tests that we removed
"TEST_FAILS_SANITIZE_LEAK=true" from in an earlier commit due to
removing the UNLEAK() from cmd_format_patch(), we can now assert that
its API use doesn't leak any "struct rev_info" memory.

This change also made commit "t5503-tagfollow.sh" pass on current
master, but that would regress when combined with
ps/fetch-atomic-fixup's de004e848a (t5503: simplify setup of test
which exercises failure of backfill, 2022-03-03) (through no fault of
that topic, that change started using "git clone" in the test, which
has an outstanding leak). Let's leave that test out for now to avoid
in-flight semantic conflicts.

1. https://lore.kernel.org/git/YUj%2FgFRh6pwrZalY@carlos-mbp.lan/
2. https://lore.kernel.org/git/87o88obkb1.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason a52f07afcb revisions API: have release_revisions() release "mailmap"
Extend the the release_revisions() function so that it frees the
"mailmap" in the "struct rev_info".

The log family of functions now calls the clear_mailmap() function
added in fa8afd18e5a (revisions API: provide and use a
release_revisions(), 2021-09-19), allowing us to whitelist some tests
with "TEST_PASSES_SANITIZE_LEAK=true".

Unfortunately having a pointer to a mailmap in "struct rev_info"
instead of an embedded member that we "own" get a bit messy, as can be
seen in the change to builtin/commit.c.

When we free() this data we won't be able to tell apart a pointer to a
"mailmap" on the heap from one on the stack. As seen in
ea57bc0d41 (log: add --use-mailmap option, 2013-01-05) the "log"
family allocates it on the heap, but in the find_author_by_nickname()
code added in ea16794e43 (commit: search author pattern against
mailmap, 2013-08-23) we allocated it on the stack instead.

Ideally we'd simply change that member to a "struct string_list
mailmap" and never free() the "mailmap" itself, but that would be a
much larger change to the revisions API.

We have code that needs to hand an existing "mailmap" to a "struct
rev_info", while we could change all of that, let's not go there
now.

The complexity isn't in the ownership of the "mailmap" per-se, but
that various things assume a "rev_info.mailmap == NULL" means "doesn't
want mailmap", if we changed that to an init'd "struct string_list
we'd need to carefully refactor things to change those assumptions.

Let's instead always free() it, and simply declare that if you add
such a "mailmap" it must be allocated on the heap. Any modern libc
will correctly panic if we free() a stack variable, so this should be
safe going forward.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason e966fc5a89 revisions API: have release_revisions() release "commits"
Extend the the release_revisions() function so that it frees the
"commits" in the "struct rev_info".

We don't expect to use this "struct rev_info" again, so there's no
reason to NULL out revs->commits, as e.g. simplify_merges() and
create_boundary_commit_list() do.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason 296a143845 revision.[ch]: document and move code declared around "init"
A subsequent commit will add "REV_INFO_INIT" macro adjacent to
repo_init_revisions(), unfortunately between the "struct rev_info"
itself and that function we've added various miscellaneous code
between the two over the years.

Let's move that code either lower in revision.h, giving it API docs
while we're at it, or in cases where it wasn't public API at all move
it into revision.c No lines of code are changed here, only moved
around. The only changes are the addition of new API comments.

The "tree_difference" variable could also be declared like this, which
I think would be a lot clearer, but let's leave that for now to keep
this a move-only change:

	static enum {
		REV_TREE_SAME,
		REV_TREE_NEW, /* Only new files */
		REV_TREE_OLD, /* Only files removed */
		REV_TREE_DIFFERENT, /* Mixed changes */
	} tree_difference = REV_TREE_SAME;

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:08 -07:00
Ævar Arnfjörð Bjarmason 1878b5edc0 revision.[ch]: provide and start using a release_revisions()
The users of the revision.[ch] API's "struct rev_info" are a major
source of memory leaks in the test suite under SANITIZE=leak, which in
turn adds a lot of noise when trying to mark up tests with
"TEST_PASSES_SANITIZE_LEAK=true".

The users of that API are largely one-shot, e.g. "git rev-list" or
"git log", or the "git checkout" and "git stash" being modified here

For these callers freeing the memory is arguably a waste of time, but
in many cases they've actually been trying to free the memory, and
just doing that in a buggy manner.

Let's provide a release_revisions() function for these users, and
start migrating them over per the plan outlined in [1]. Right now this
only handles the "pending" member of the struct, but more will be
added in subsequent commits.

Even though we only clear the "pending" member now, let's not leave a
trap in code like the pre-image of index_differs_from(), where we'd
start doing the wrong thing as soon as the release_revisions() learned
to clear its "diffopt". I.e. we need to call release_revisions() after
we've inspected any state in "struct rev_info".

This leaves in place e.g. clear_pathspec(&rev.prune_data) in
stash_working_tree() in builtin/stash.c, subsequent commits will teach
release_revisions() to free "prune_data" and other members that in
some cases are individually cleared by users of "struct rev_info" by
reaching into its members. Those subsequent commits will remove the
relevant calls to e.g. clear_pathspec().

We avoid amending code in index_differs_from() in diff-lib.c as well
as wt_status_collect_changes_index(), has_unstaged_changes() and
has_uncommitted_changes() in wt-status.c in a way that assumes that we
are already clearing the "diffopt" member. That will be handled in a
subsequent commit.

1. https://lore.kernel.org/git/87a6k8daeu.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:08 -07:00
Ævar Arnfjörð Bjarmason bf20fe4ca8 cocci: add and apply free_commit_list() rules
Add and apply coccinelle rules to remove "if (E)" before
"free_commit_list(E)", the function can accept NULL, and further
change cases where "E = NULL" followed to also be unconditionally.

The code changes in this commit were entirely made by the coccinelle
rule being added here, and applied with:

    make contrib/coccinelle/free.cocci.patch
    patch -p1 <contrib/coccinelle/free.cocci.patch

The only manual intervention here is that the the relevant code in
commit.c has been manually re-indented.

Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:08 -07:00
Derrick Stolee cc91044256 list-objects-filter: remove CL_ARG__FILTER
We have established the command-line interface for the --[no-]filter
options for a while now, so we do not need a helper to make this
editable in the future.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-23 13:13:17 -07:00
Junio C Hamano 7391ecd338 Merge branch 'ds/partial-bundles'
Bundle file format gets extended to allow a partial bundle,
filtered by similar criteria you would give when making a
partial/lazy clone.

* ds/partial-bundles:
  clone: fail gracefully when cloning filtered bundle
  bundle: unbundle promisor packs
  bundle: create filtered bundles
  rev-list: move --filter parsing into revision.c
  bundle: parse filter capability
  list-objects: handle NULL function pointers
  MyFirstObjectWalk: update recommended usage
  list-objects: consolidate traverse_commit_list[_filtered]
  pack-bitmap: drop filter in prepare_bitmap_walk()
  pack-objects: use rev.filter when possible
  revision: put object filter into struct rev_info
  list-objects-filter-options: create copy helper
  index-pack: document and test the --promisor option
2022-03-21 15:14:24 -07:00
Derrick Stolee c4ea513f4a rev-list: move --filter parsing into revision.c
Now that 'struct rev_info' has a 'filter' member and most consumers of
object filtering are using that member instead of an external struct,
move the parsing of the '--filter' option out of builtin/rev-list.c and
into revision.c.

This use within handle_revision_pseudo_opt() allows us to find the
option within setup_revisions() if the arguments are passed directly. In
the case of a command such as 'git blame', the arguments are first
scanned and checked with parse_revision_opt(), which complains about the
option, so 'git blame --filter=blob:none <file>' does not become valid
with this change.

Some commands, such as 'git diff' gain this option without having it
make an effect. And 'git diff --objects' was already possible, but does
not actually make sense in that builtin.

The key addition that is coming is 'git bundle create --filter=<X>' so
we can create bundles containing promisor packs. More work is required
to make them fully functional, but that will follow.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-09 10:25:27 -08:00
Junio C Hamano 5b84280c65 Merge branch 'ab/grep-patterntype'
Some code clean-up in the "git grep" machinery.

* ab/grep-patterntype:
  grep: simplify config parsing and option parsing
  grep.c: do "if (bool && memchr())" not "if (memchr() && bool)"
  grep.h: make "grep_opt.pattern_type_option" use its enum
  grep API: call grep_config() after grep_init()
  grep.c: don't pass along NULL callback value
  built-ins: trust the "prefix" from run_builtin()
  grep tests: add missing "grep.patternType" config tests
  grep tests: create a helper function for "BRE" or "ERE"
  log tests: check if grep_config() is called by "log"-like cmds
  grep.h: remove unused "regex_t regexp" from grep_opt
2022-02-25 15:47:36 -08:00
Junio C Hamano 8813596531 Merge branch 'ah/log-no-graph'
"git log --graph --graph" used to leak a graph structure, and there
was no way to countermand "--graph" that appear earlier on the
command line.  A "--no-graph" option has been added and resource
leakage has been plugged.

* ah/log-no-graph:
  log: add a --no-graph option
  log: fix memory leak if --graph is passed multiple times
2022-02-23 16:58:03 -08:00
Ævar Arnfjörð Bjarmason 04bf052eef grep: simplify config parsing and option parsing
Simplify the parsing of "grep.patternType" and
"grep.extendedRegexp". This changes no behavior, but gets rid of
complex parsing logic that isn't needed anymore.

When "grep.patternType" was introduced in 84befcd0a4 (grep: add a
grep.patternType configuration setting, 2012-08-03) we promised that:

 1. You can set "grep.patternType", and "[setting it to] 'default'
    will return to the default matching behavior".

    In that context "the default" meant whatever the configuration
    system specified before that change, i.e. via grep.extendedRegexp.

 2. We'd support the existing "grep.extendedRegexp" option, but ignore
    it when the new "grep.patternType" option is set. We said we'd
    only ignore the older "grep.extendedRegexp" option "when the
    `grep.patternType` option is set to a value other than
    'default'".

In a preceding commit we changed grep_config() to be called after
grep_init(), which means that much of the complexity here can go
away.

As before both "grep.patternType" and "grep.extendedRegexp" are
last-one-wins variable, with "grep.extendedRegexp" yielding to
"grep.patternType", except when "grep.patternType=default".

Note that as the previously added tests indicate this cannot be done
on-the-fly as we see the config variables, without introducing more
state keeping. I.e. if we see:

    -c grep.extendedRegexp=false
    -c grep.patternType=default
    -c extendedRegexp=true

We need to select ERE, since grep.patternType=default unselects that
variable, which normally has higher precedence, but we also need to
select BRE in cases of:

    -c grep.extendedRegexp=true \
    -c grep.extendedRegexp=false

Which would not be the case for this, which select ERE:

    -c grep.patternType=extended \
    -c grep.extendedRegexp=false

Therefore we cannot do this on-the-fly in grep_config without also
introducing tracking variables for not only the pattern type, but what
the source of that pattern type was.

So we need to decide on the pattern after our config was fully
parsed. Let's do that by deferring the decision on the pattern type
until it's time to compile it in compile_regexp().

By that time we've not only parsed the config, but also handled the
command-line options. Those will set "opt.pattern_type_option" (*not*
"opt.extended_regexp_option"!).

At that point all we need to do is see if "grep.patternType" was
UNSPECIFIED in the end (including an explicit "=default"), if so we'll
use the "grep.extendedRegexp" configuration, if any.

See my 07a3d41173 (grep: remove regflags from the public grep_opt
API, 2017-06-29) for addition of the two comments being removed here,
i.e. the complexity noted in that commit is now going away.

1. https://lore.kernel.org/git/patch-v8-09.10-c211bb0c69d-20220118T155211Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-15 18:00:50 -08:00
Ævar Arnfjörð Bjarmason 9725c8dda2 built-ins: trust the "prefix" from run_builtin()
Change code in "builtin/grep.c" and "builtin/ls-tree.c" to trust the
"prefix" passed from "run_builtin()". The "prefix" we get from setup.c
is either going to be NULL or a string of length >0, never "".

So we can drop the "prefix && *prefix" checks added for
"builtin/grep.c" in 0d042fecf2 (git-grep: show pathnames relative to
the current directory, 2006-08-11), and for "builtin/ls-tree.c" in
a69dd585fc (ls-tree: chomp leading directories when run from a
subdirectory, 2005-12-23).

As seen in code in revision.c that was added in cd676a5136 (diff
--relative: output paths as relative to the current subdirectory,
2008-02-12) we already have existing code that does away with this
assertion.

This makes it easier to reason about a subsequent change to the
"prefix_length" code in grep.c in a subsequent commit, and since we're
going to the trouble of doing that let's leave behind an assert() to
promise this to any future callers.

For "builtin/grep.c" it would be painful to pass the "prefix" down the
callchain of:

    cmd_grep -> grep_tree -> grep_submodule -> grep_cache -> grep_oid ->
    grep_source_name

So for the code that needs it in grep_source_name() let's add a
"grep_prefix" variable similar to the existing "ls_tree_prefix".

While at it let's move the code in cmd_ls_tree() around so that we
assign to the "ls_tree_prefix" right after declaring the variables,
and stop assigning to "prefix". We only subsequently used that
variable later in the function after clobbering it. Let's just use our
own "grep_prefix" instead.

Let's also add an assert() in git.c, so that we'll make this promise
about the "prefix" to any current and future callers, as well as to
any readers of the code.

Code history:

 * The strlen() in "grep.c" hasn't been used since 493b7a08d8 (grep:
   accept relative paths outside current working directory, 2009-09-05).

   When that code was added in 0d042fecf2 (git-grep: show pathnames
   relative to the current directory, 2006-08-11) we used the length.

   But since 493b7a08d8 we haven't used it for anything except a
   boolean check that we could have done on the "prefix" member
   itself.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-15 18:00:50 -08:00
Alex Henrie 087c745833 log: add a --no-graph option
It's useful to be able to countermand a previous --graph option, for
example if `git log --graph` is run via an alias.

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-11 10:06:41 -08:00
Alex Henrie dccf6c16f1 log: fix memory leak if --graph is passed multiple times
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-11 10:06:40 -08:00
Jerry Zhang 9d505b7b49 git-rev-list: add --exclude-first-parent-only flag
It is useful to know when a branch first diverged in history
from some integration branch in order to be able to enumerate
the user's local changes. However, these local changes can
include arbitrary merges, so it is necessary to ignore this
merge structure when finding the divergence point.

In order to do this, teach the "rev-list" family to accept
"--exclude-first-parent-only", which restricts the traversal
of excluded commits to only follow first parent links.

   -A-----E-F-G--main
     \   / /
      B-C-D--topic

In this example, the goal is to return the set {B, C, D} which
represents a topic branch that has been merged into main branch.
`git rev-list topic ^main` will end up returning no commits
since excluding main will end up traversing the commits on topic
as well. `git rev-list --exclude-first-parent-only topic ^main`
however will return {B, C, D} as desired.

Add docs for the new flag, and clarify the doc for --first-parent
to indicate that it applies to traversing the set of included
commits only.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-12 11:08:42 -08:00
Junio C Hamano c17de5a505 Merge branch 'ja/i18n-similar-messages'
Similar message templates have been consolidated so that
translators need to work on fewer number of messages.

* ja/i18n-similar-messages:
  i18n: turn even more messages into "cannot be used together" ones
  i18n: ref-filter: factorize "%(foo) atom used without %(bar) atom"
  i18n: factorize "--foo outside a repository"
  i18n: refactor "unrecognized %(foo) argument" strings
  i18n: factorize "no directory given for --foo"
  i18n: factorize "--foo requires --bar" and the like
  i18n: tag.c factorize i18n strings
  i18n: standardize "cannot open" and "cannot read"
  i18n: turn "options are incompatible" into "cannot be used together"
  i18n: refactor "%s, %s and %s are mutually exclusive"
  i18n: refactor "foo and bar are mutually exclusive"
2022-01-10 11:52:56 -08:00
Junio C Hamano 2043ce828e Merge branch 'rs/log-invert-grep-with-headers'
"git log --invert-grep --author=<name>" used to exclude commits
written by the given author, but now "--invert-grep" only affects
the matches made by the "--grep=<pattern>" option.

* rs/log-invert-grep-with-headers:
  log: let --invert-grep only invert --grep
2022-01-05 14:01:30 -08:00
Jean-Noël Avila 6fa00ee843 i18n: factorize "--foo requires --bar" and the like
They are all replaced by "the option '%s' requires '%s'", which is a
new string but replaces 17 previous unique strings.

Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-05 13:31:00 -08:00
Jean-Noël Avila 12909b6b8a i18n: turn "options are incompatible" into "cannot be used together"
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-05 13:29:23 -08:00
Junio C Hamano 5a4069a1d8 Merge branch 'jc/c99-var-decl-in-for-loop'
Weather balloon to find compilers that do not grok variable
declaration in the for() loop.

* jc/c99-var-decl-in-for-loop:
  revision: use C99 declaration of variable in for() loop
2021-12-21 15:03:15 -08:00
René Scharfe 794c000267 log: let --invert-grep only invert --grep
The option --invert-grep is documented to filter out commits whose
messages match the --grep filters.  However, it also affects the
header matches (--author, --committer), which is not intended.

Move the handling of that option to grep.c, as only the code there can
distinguish between matches in the header from those in the message
body.  If --invert-grep is given then enable extended expressions (not
the regex type, we just need git grep's --not to work), negate the body
patterns and check if any of them match by piggy-backing on the
collect_hits mechanism of grep_source_1().

Collecting the matches in struct grep_opt is a bit iffy, but with
"last_shown" we have a precedent for writing state information to that
struct.

Reported-by: Dotan Cohen <dotancohen@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-17 14:13:08 -08:00
Junio C Hamano 44ba10d671 revision: use C99 declaration of variable in for() loop
There are certain C99 features that might be nice to use in our code
base, but we've hesitated to do so in order to avoid breaking
compatibility with older compilers. But we don't actually know if
people are even using pre-C99 compilers these days.

One way to figure that out is to introduce a very small use of a
feature, and see if anybody complains, and we've done so to probe
the portability for a few features like "trailing comma in enum
declaration", "designated initializer for struct", and "designated
initializer for array".  A few years ago, we tried to use a handy

    for (int i = 0; i < n; i++)
	use(i);

to introduce a new variable valid only in the loop, but found that
some compilers we cared about didn't like it back then.  Two years
is a long-enough time, so let's try it again.

If this patch can survive a few releases without complaint, then we
can feel more confident that variable declaration in for() loop is
supported by the compilers our user base use.  And if we do get
complaints, then we'll have gained some data and we can easily
revert this patch.

Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-03 10:16:00 -08:00
Junio C Hamano 8996d68ac7 Merge branch 'ps/connectivity-optim'
Regression fix.

* ps/connectivity-optim:
  Revert "connected: do not sort input revisions"
2021-11-12 15:29:24 -08:00
Junio C Hamano a7df4f52af Revert "connected: do not sort input revisions"
This reverts commit f45022dc2f,
as this is like breakage in the traversal more likely.  In a
history with 10 single strand of pearls,

   1-->2-->3--...->7-->8-->9-->10

asking "rev-list --unsorted-input 1 10 --not 9 8 7 6 5 4" fails to
paint the bottom 1 uninteresting as the traversal stops, without
completing the propagation of uninteresting bit starting at 4 down
through 3 and 2 to 1.
2021-11-11 12:34:41 -08:00
Junio C Hamano 404c4a5462 Merge branch 'ab/designated-initializers'
Code clean-up.

* ab/designated-initializers:
  cbtree.h: define cb_init() in terms of CBTREE_INIT
  *.h: move some *_INIT to designated initializers
  *.h _INIT macros: don't specify fields equal to 0
  *.[ch] *_INIT macros: use { 0 } for a "zero out" idiom
  submodule-config.h: remove unused SUBMODULE_INIT macro
2021-10-11 10:21:48 -07:00
Junio C Hamano f6c075ad71 Merge branch 'jk/ref-paranoia'
The ref iteration code used to optionally allow dangling refs to be
shown, which has been tightened up.

* jk/ref-paranoia:
  refs: drop "broken" flag from for_each_fullref_in()
  ref-filter: drop broken-ref code entirely
  ref-filter: stop setting FILTER_REFS_INCLUDE_BROKEN
  repack, prune: drop GIT_REF_PARANOIA settings
  refs: turn on GIT_REF_PARANOIA by default
  refs: omit dangling symrefs when using GIT_REF_PARANOIA
  refs: add DO_FOR_EACH_OMIT_DANGLING_SYMREFS flag
  refs-internal.h: reorganize DO_FOR_EACH_* flag documentation
  refs-internal.h: move DO_FOR_EACH_* flags next to each other
  t5312: be more assertive about command failure
  t5312: test non-destructive repack
  t5312: create bogus ref as necessary
  t5312: drop "verbose" helper
  t5600: provide detached HEAD for corruption failures
  t5516: don't use HEAD ref for invalid ref-deletion tests
  t7900: clean up some more broken refs
2021-10-11 10:21:47 -07:00
Junio C Hamano 921c795c25 Merge branch 'jt/add-submodule-odb-clean-up'
More code paths that use the hack to add submodule's object
database to the set of alternate object store have been cleaned up.

* jt/add-submodule-odb-clean-up:
  revision: remove "submodule" from opt struct
  repository: support unabsorbed in repo_submodule_init
  submodule: remove unnecessary unabsorbed fallback
2021-10-06 13:40:11 -07:00
Ævar Arnfjörð Bjarmason 9865b6e6a4 *.[ch] *_INIT macros: use { 0 } for a "zero out" idiom
In C it isn't required to specify that all members of a struct are
zero'd out to 0, NULL or '\0', just providing a "{ 0 }" will
accomplish that.

Let's also change code that provided N zero'd fields to just
provide one, and change e.g. "{ NULL }" to "{ 0 }" for
consistency. I.e. even if the first member is a pointer let's use "0"
instead of "NULL". The point of using "0" consistently is to pick one,
and to not have the reader wonder why we're not using the same pattern
everywhere.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 14:47:59 -07:00
Jeff King 67985e4e4a refs: drop "broken" flag from for_each_fullref_in()
No callers pass in anything but "0" here. Likewise to our sibling
functions. Note that some of them ferry along the flag, but none of
their callers pass anything but "0" either.

Nor is anybody likely to change that. Callers which really want to see
all of the raw refs use for_each_rawref(). And anybody interested in
iterating a subset of the refs will likely be happy to use the
now-default behavior of showing broken refs, but omitting dangling
symlinks.

So we can get rid of this whole feature.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 12:36:45 -07:00
Jonathan Tan 10a0d6ae64 revision: remove "submodule" from opt struct
Clean up a TODO in revision.h by removing the "submodule" field from
struct setup_revision_opt. This field is only used to specify the ref
store to use, so use rev_info->repo to determine the ref store instead.

The only users of this field are merge-ort.c and merge-recursive.c.
However, both these files specify the superproject as rev_info->repo and
the submodule as setup_revision_opt->submodule. In order to be able to
pass the submodule as rev_info->repo, all commits must be parsed with
the submodule explicitly specified; this patch does that as well. (An
incremental solution in which only some commits are parsed with explicit
submodule will not work, because if the same commit is parsed twice in
different repositories, there will be 2 heap-allocated object structs
corresponding to that commit, and any flag set by the revision walking
mechanism on one of them will not be reflected onto the other.)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 14:09:30 -07:00
Patrick Steinhardt f559d6d45e revision: avoid hitting packfiles when commits are in commit-graph
When queueing references in git-rev-list(1), we try to optimize parsing
of commits via the commit-graph. To do so, we first look up the object's
type, and if it is a commit we call `repo_parse_commit()` instead of
`parse_object()`. This is quite inefficient though given that we're
always uncompressing the object header in order to determine the type.
Instead, we can opportunistically search the commit-graph for the object
ID: in case it's found, we know it's a commit and can directly fill in
the commit object without having to uncompress the object header.

Expose a new function `lookup_commit_in_graph()`, which tries to find a
commit in the commit-graph by ID, and convert `get_reference()` to use
this function. This provides a big performance win in cases where we
load references in a repository with lots of references pointing to
commits. The following has been executed in a real-world repository with
about 2.2 million refs:

    Benchmark #1: HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev
      Time (mean ± σ):      4.458 s ±  0.044 s    [User: 4.115 s, System: 0.342 s]
      Range (min … max):    4.409 s …  4.534 s    10 runs

    Benchmark #2: HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev
      Time (mean ± σ):      3.089 s ±  0.015 s    [User: 2.768 s, System: 0.321 s]
      Range (min … max):    3.061 s …  3.105 s    10 runs

    Summary
      'HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' ran
        1.44 ± 0.02 times faster than 'HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-09 09:51:12 -07:00
Patrick Steinhardt bf9c0cbddb revision: stop retrieving reference twice
When queueing up references for the revision walk, `handle_one_ref()`
will resolve the reference's object ID via `get_reference()` and then
queue the ID as pending object via `add_pending_oid()`. But given that
`add_pending_oid()` is only a thin wrapper around `add_pending_object()`
which fist calls `get_reference()`, we effectively resolve the reference
twice and thus duplicate some of the work.

Fix the issue by instead calling `add_pending_object()` directly, which
takes the already-resolved object as input. In a repository with lots of
refs, this translates into a near 10% speedup:

    Benchmark #1: HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev
      Time (mean ± σ):      5.015 s ±  0.038 s    [User: 4.698 s, System: 0.316 s]
      Range (min … max):    4.970 s …  5.089 s    10 runs

    Benchmark #2: HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev
      Time (mean ± σ):      4.606 s ±  0.029 s    [User: 4.260 s, System: 0.345 s]
      Range (min … max):    4.565 s …  4.657 s    10 runs

    Summary
      'HEAD: rev-list --unsorted-input --objects --quiet --not --all --not $newrev' ran
        1.09 ± 0.01 times faster than 'HEAD~: rev-list --unsorted-input --objects --quiet --not --all --not $newrev'

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-09 09:51:12 -07:00
Patrick Steinhardt f45022dc2f connected: do not sort input revisions
In order to compute whether objects reachable from a set of tips are all
connected, we do a revision walk with these tips as positive references
and `--not --all`. `--not --all` will cause the revision walk to load
all preexisting references as uninteresting, which can be very expensive
in repositories with many references.

Benchmarking the git-rev-list(1) command highlights that by far the most
expensive single phase is initial sorting of the input revisions: after
all references have been loaded, we first sort commits by author date.
In a real-world repository with about 2.2 million references, it makes
up about 40% of the total runtime of git-rev-list(1).

Ultimately, the connectivity check shouldn't really bother about the
order of input revisions at all. We only care whether we can actually
walk all objects until we hit the cut-off point. So sorting the input is
a complete waste of time.

Introduce a new "--unsorted-input" flag to git-rev-list(1) which will
cause it to not sort the commits and adjust the connectivity check to
always pass the flag. This results in the following speedups, executed
in a clone of gitlab-org/gitlab [1]:

    Benchmark #1: git rev-list  --objects --quiet --not --all --not $(cat newrev)
      Time (mean ± σ):      7.639 s ±  0.065 s    [User: 7.304 s, System: 0.335 s]
      Range (min … max):    7.543 s …  7.742 s    10 runs

    Benchmark #2: git rev-list --unsorted-input --objects --quiet --not --all --not $newrev
      Time (mean ± σ):      4.995 s ±  0.044 s    [User: 4.657 s, System: 0.337 s]
      Range (min … max):    4.909 s …  5.048 s    10 runs

    Summary
      'git rev-list --unsorted-input --objects --quiet --not --all --not $(cat newrev)' ran
        1.53 ± 0.02 times faster than 'git rev-list  --objects --quiet --not --all --not $newrev'

[1]: https://gitlab.com/gitlab-org/gitlab.git. Note that not all refs
     are visible to clients.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-09 09:51:12 -07:00