Commit graph

1039 commits

Author SHA1 Message Date
Junio C Hamano a6dce0afc3 Merge branch 'jk/free-commit-buffer-of-skipped-commits'
The code forgot to discard unnecessary in-core commit buffer data
for commits that "git log --skip=<number>" traversed but omitted
from the output, which has been corrected.

* jk/free-commit-buffer-of-skipped-commits:
  revision: free commit buffers for skipped commits
2024-09-10 13:16:41 -07:00
Jeff King 6bd2ae67a5 revision: free commit buffers for skipped commits
In git-log we leave the save_commit_buffer flag set to "1", which tells
the commit parsing code to store the object content after it has parsed
it to find parents, tree, etc. That lets us reuse the contents for
pretty-printing the commit in the output. And then after printing each
commit, we call free_commit_buffer(), since we don't need it anymore.

But some options may cause us to traverse commits which are not part of
the output. And so git-log does not see them at all, and doesn't free
them. One such case is something like:

  git log -n 1000 --skip=1000000

which will churn through a million commits, before showing only a
thousand. We loop through these inside get_revision(), without freeing
the contents. As a result, we end up storing the object data for those
million commits simultaneously.

We should free the stored buffers (if any) for those commits as we skip
over them, which is what this patch does. Running the above command in
linux.git drops the peak heap usage from ~1.1GB to ~200MB, according to
valgrind/massif. (I thought we might get an even bigger improvement, but
the remaining memory is going to commit/tree structs, which we do hold
on to forever).

Note that this problem doesn't occur if:

  - you're running a git-rev-list without a --format parameter; it turns
    off save_commit_buffer by default, since it only output the object
    id

  - you've built a commit-graph file, since in that case we'd use the
    optimized graph data instead of the initial parse, and then do a
    lazy parse for commits we're actually going to output

There are probably some other option combinations that can likewise
end up with useless stored commit buffers. For example, if you ask for
"foo..bar", then we'll have to walk down to the merge base, and
everything on the "foo" side won't be shown. Tuning the "save" behavior
to handle that might be tricky (I guess maybe drop buffers for anything
we mark as UNINTERESTING?). And in the long run, the right solution here
is probably to make sure the commit-graph is built (since it fixes the
memory problem _and_ drastically reduces CPU usage).

But since this "--skip" case is an easy one-liner, it's worth fixing in
the meantime. It should be OK to make this call even if there is no
saved buffer (e.g., because save_commit_buffer=0, or because a
commit-graph was used), since it's O(1) to look up the buffer and is a
noop if it isn't present. I verified by running the above command after
"git commit-graph write --reachable", and it takes the same time with
and without this patch.

Reported-by: Yuri Karnilaev <karnilaev@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-30 14:03:00 -07:00
Junio C Hamano 5e56a39e6a Merge branch 'ps/config-wo-the-repository'
Use of API functions that implicitly depend on the_repository
object in the config subsystem has been rewritten to pass a
repository object through the callchain.

* ps/config-wo-the-repository:
  config: hide functions using `the_repository` by default
  global: prepare for hiding away repo-less config functions
  config: don't depend on `the_repository` with branch conditions
  config: don't have setters depend on `the_repository`
  config: pass repo to functions that rename or copy sections
  config: pass repo to `git_die_config()`
  config: pass repo to `git_config_get_expiry_in_days()`
  config: pass repo to `git_config_get_expiry()`
  config: pass repo to `git_config_get_max_percent_split_change()`
  config: pass repo to `git_config_get_split_index()`
  config: pass repo to `git_config_get_index_threads()`
  config: expose `repo_config_clear()`
  config: introduce missing setters that take repo as parameter
  path: hide functions using `the_repository` by default
  path: stop relying on `the_repository` in `worktree_git_path()`
  path: stop relying on `the_repository` when reporting garbage
  hooks: remove implicit dependency on `the_repository`
  editor: do not rely on `the_repository` for interactive edits
  path: expose `do_git_common_path()` as `repo_common_pathv()`
  path: expose `do_git_path()` as `repo_git_pathv()`
2024-08-23 09:02:34 -07:00
Patrick Steinhardt a973f60dc7 path: stop relying on the_repository in worktree_git_path()
When not provided a worktree, then `worktree_git_path()` will fall back
to returning a path relative to the main repository. In this case, we
implicitly rely on `the_repository` to derive the path. Remove this
dependency by passing a `struct repository` as parameter.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-13 10:01:01 -07:00
John Cai e8207717f1 refs: add referent to each_ref_fn
Add a parameter to each_ref_fn so that callers to the ref APIs
that use this function as a callback can have acess to the
unresolved value of a symbolic ref.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-09 08:47:34 -07:00
Junio C Hamano 3997614c24 Merge branch 'ps/leakfixes-more'
More memory leaks have been plugged.

* ps/leakfixes-more: (29 commits)
  builtin/blame: fix leaking ignore revs files
  builtin/blame: fix leaking prefixed paths
  blame: fix leaking data for blame scoreboards
  line-range: plug leaking find functions
  merge: fix leaking merge bases
  builtin/merge: fix leaking `struct cmdnames` in `get_strategy()`
  sequencer: fix memory leaks in `make_script_with_merges()`
  builtin/clone: plug leaking HEAD ref in `wanted_peer_refs()`
  apply: fix leaking string in `match_fragment()`
  sequencer: fix leaking string buffer in `commit_staged_changes()`
  commit: fix leaking parents when calling `commit_tree_extended()`
  config: fix leaking "core.notesref" variable
  rerere: fix various trivial leaks
  builtin/stash: fix leak in `show_stash()`
  revision: free diff options
  builtin/log: fix leaking commit list in git-cherry(1)
  merge-recursive: fix memory leak when finalizing merge
  builtin/merge-recursive: fix leaking object ID bases
  builtin/difftool: plug memory leaks in `run_dir_diff()`
  object-name: free leaking object contexts
  ...
2024-07-08 14:53:10 -07:00
Junio C Hamano ecf7fc600a Merge branch 'tb/path-filter-fix'
The Bloom filter used for path limited history traversal was broken
on systems whose "char" is unsigned; update the implementation and
bump the format version to 2.

* tb/path-filter-fix:
  bloom: introduce `deinit_bloom_filters()`
  commit-graph: reuse existing Bloom filters where possible
  object.h: fix mis-aligned flag bits table
  commit-graph: new Bloom filter version that fixes murmur3
  commit-graph: unconditionally load Bloom filters
  bloom: prepare to discard incompatible Bloom filters
  bloom: annotate filters with hash version
  repo-settings: introduce commitgraph.changedPathsVersion
  t4216: test changed path filters with high bit paths
  t/helper/test-read-graph: implement `bloom-filters` mode
  bloom.h: make `load_bloom_filter_from_graph()` public
  t/helper/test-read-graph.c: extract `dump_graph_info()`
  gitformat-commit-graph: describe version 2 of BDAT
  commit-graph: ensure Bloom filters are read with consistent settings
  revision.c: consult Bloom filters for root commits
  t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
2024-07-08 14:53:10 -07:00
Junio C Hamano 7b472da915 Merge branch 'ps/use-the-repository'
A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help
transition the codebase to rely less on the availability of the
singleton the_repository instance.

* ps/use-the-repository:
  hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE`
  t/helper: remove dependency on `the_repository` in "proc-receive"
  t/helper: fix segfault in "oid-array" command without repository
  t/helper: use correct object hash in partial-clone helper
  compat/fsmonitor: fix socket path in networked SHA256 repos
  replace-object: use hash algorithm from passed-in repository
  protocol-caps: use hash algorithm from passed-in repository
  oidset: pass hash algorithm when parsing file
  http-fetch: don't crash when parsing packfile without a repo
  hash-ll: merge with "hash.h"
  refs: avoid include cycle with "repository.h"
  global: introduce `USE_THE_REPOSITORY_VARIABLE` macro
  hash: require hash algorithm in `empty_tree_oid_hex()`
  hash: require hash algorithm in `is_empty_{blob,tree}_oid()`
  hash: make `is_null_oid()` independent of `the_repository`
  hash: convert `oidcmp()` and `oideq()` to compare whole hash
  global: ensure that object IDs are always padded
  hash: require hash algorithm in `oidread()` and `oidclr()`
  hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()`
  hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions
2024-07-02 09:59:00 -07:00
Taylor Blau 1343c89313 revision.c: consult Bloom filters for root commits
The commit-graph stores changed-path Bloom filters which represent the
set of paths included in a tree-level diff between a commit's root tree
and that of its parent.

When a commit has no parents, the tree-diff is computed against that
commit's root tree and the empty tree. In other words, every path in
that commit's tree is stored in the Bloom filter (since they all appear
in the diff).

Consult these filters during pathspec-limited traversals in the function
`rev_same_tree_as_empty()`. Doing so yields a performance improvement
where we can avoid enumerating the full set of paths in a parentless
commit's root tree when we know that the path(s) of interest were not
listed in that commit's changed-path Bloom filter.

Suggested-by: SZEDER Gábor <szeder.dev@gmail.com>
Original-patch-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-25 13:52:05 -07:00
Junio C Hamano 107ed55103 Merge branch 'jc/worktree-git-path'
Code cleanup.

* jc/worktree-git-path:
  worktree_git_path(): move the declaration to path.h
2024-06-24 16:39:15 -07:00
Patrick Steinhardt e7da938570 global: introduce USE_THE_REPOSITORY_VARIABLE macro
Use of the `the_repository` variable is deprecated nowadays, and we
slowly but steadily convert the codebase to not use it anymore. Instead,
callers should be passing down the repository to work on via parameters.

It is hard though to prove that a given code unit does not use this
variable anymore. The most trivial case, merely demonstrating that there
is no direct use of `the_repository`, is already a bit of a pain during
code reviews as the reviewer needs to manually verify claims made by the
patch author. The bigger problem though is that we have many interfaces
that implicitly rely on `the_repository`.

Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code
units to opt into usage of `the_repository`. The intent of this macro is
to demonstrate that a certain code unit does not use this variable
anymore, and to keep it from new dependencies on it in future changes,
be it explicit or implicit

For now, the macro only guards `the_repository` itself as well as
`the_hash_algo`. There are many more known interfaces where we have an
implicit dependency on `the_repository`, but those are not guarded at
the current point in time. Over time though, we should start to add
guards as required (or even better, just remove them).

Define the macro as required in our code units. As expected, most of our
code still relies on the global variable. Nearly all of our builtins
rely on the variable as there is no way yet to pass `the_repository` to
their entry point. For now, declare the macro in "biultin.h" to keep the
required changes at least a little bit more contained.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-14 10:26:33 -07:00
Patrick Steinhardt a90a089611 revision: free diff options
There is a todo comment in `release_revisions()` that mentions that we
need to free the diff options, which was added via 54c8a7c379 (revisions
API: add a TODO for diff_free(&revs->diffopt), 2022-04-14). Releasing
the diff options wasn't quite feasible at that time because some call
sites rely on its contents to remain even after the revisions have been
released.

In fact, there really only are a couple of callsites that misbehave
here:

  - `cmd_shortlog()` releases the revisions, but continues to access its
    file pointer.

  - `do_diff_cache()` creates a shallow copy of `struct diff_options`,
    but does not set the `no_free` member. Consequently, we end up
    releasing resources of the caller-provided diff options.

  - `diff_free()` and friends do not play nice when being called
    multiple times as they don't unset data structures that they have
    just released.

Fix all of those cases and enable the call to `diff_free()`, which plugs
a bunch of memory leaks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11 13:15:06 -07:00
Patrick Steinhardt f87c55c264 object-name: free leaking object contexts
While it is documented in `struct object_context::path` that this
variable needs to be released by the caller, this fact is rather easy to
miss given that we do not ever provide a function to release the object
context. And of course, while some callers dutifully release the path,
many others don't.

Introduce a new `object_context_release()` function that releases the
path. Convert callsites that used to free the path to use that new
function and add missing calls to callsites that were leaking memory.
Refactor those callsites as required to have a single return path, only.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11 13:15:05 -07:00
Patrick Steinhardt 9748537437 revision: fix leaking display notes
We never free the display notes options embedded into `struct revision`.
Implement a new function `release_display_notes()` that we can call in
`release_revisions()` to fix this.

There is another gotcha here though: we play some games with the string
list used to track extra notes refs, where we sometimes set the bit that
indicates that strings should be strdup'd and sometimes unset it. This
dance is done to avoid a copy of an already-allocated string when we
call `enable_ref_display_notes()`. But this dance is rather pointless as
we can instead call `string_list_append_nodup()` to transfer ownership
of the allocated string to the list.

Refactor the code to do so and drop the `strdup_strings` dance.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11 13:15:05 -07:00
Patrick Steinhardt 56931c4d89 revision: fix memory leak when reversing revisions
When reversing revisions in a rev walk, `get_revision()` will allocate a
new commit list and assign it to `revs->commits`. It does not free the
old list though, which makes it leak. Fix this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-11 13:15:04 -07:00
Junio C Hamano bf6a86236e worktree_git_path(): move the declaration to path.h
The definition of this function is in path.c but its declaration is
in worktree.h, which is something unexpected.  The function is
explained as "Similar to git_path()"; declaring it next to where
git_path() is declared would make more sense.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-08 11:42:37 -07:00
Patrick Steinhardt 844d190677 revision: always store allocated strings in output encoding
The `git_log_output_encoding` variable can be set via the `--encoding=`
option. When doing so, we conditionally either assign it to the passed
value, or if the value is "none" we assign it the empty string.
Depending on which of the both code paths we pick though, the variable
may end up being assigned either an allocated string or a string
constant.

This is somewhat risky and may easily lead to bugs when a different code
path may want to reassign a new value to it, freeing the previous value.
We already to this when parsing the "i18n.logoutputencoding" config in
`git_default_i18n_config()`. But because the config is typically parsed
before we parse command line options this has been fine so far.

Regardless of that, safeguard the code such that the variable always
contains an allocated string. While at it, also free the old value in
case there was any to plug a potential memory leak.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07 10:30:54 -07:00
Patrick Steinhardt 2e5c4758b7 cocci: apply rules to rewrite callers of "refs" interfaces
Apply the rules that rewrite callers of "refs" interfaces to explicitly
pass `struct ref_store`. The resulting patch has been applied with the
`--whitespace=fix` option.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-07 10:06:59 -07:00
Junio C Hamano 1002f28a52 Merge branch 'eb/hash-transition'
Work to support a repository that work with both SHA-1 and SHA-256
hash algorithms has started.

* eb/hash-transition: (30 commits)
  t1016-compatObjectFormat: add tests to verify the conversion between objects
  t1006: test oid compatibility with cat-file
  t1006: rename sha1 to oid
  test-lib: compute the compatibility hash so tests may use it
  builtin/ls-tree: let the oid determine the output algorithm
  object-file: handle compat objects in check_object_signature
  tree-walk: init_tree_desc take an oid to get the hash algorithm
  builtin/cat-file: let the oid determine the output algorithm
  rev-parse: add an --output-object-format parameter
  repository: implement extensions.compatObjectFormat
  object-file: update object_info_extended to reencode objects
  object-file-convert: convert commits that embed signed tags
  object-file-convert: convert commit objects when writing
  object-file-convert: don't leak when converting tag objects
  object-file-convert: convert tag objects when writing
  object-file-convert: add a function to convert trees between algorithms
  object: factor out parse_mode out of fast-import and tree-walk into in object.h
  cache: add a function to read an OID of a specific algorithm
  tag: sign both hashes
  commit: export add_header_signature to support handling signatures on tags
  ...
2024-03-28 14:13:50 -07:00
Junio C Hamano 8e663afb95 Merge branch 'as/option-names-in-messages'
Error message updates.

* as/option-names-in-messages:
  revision.c: trivial fix to message
  builtin/clone.c: trivial fix of message
  builtin/remote.c: trivial fix of error message
  transport-helper.c: trivial fix of error message
2024-03-15 16:05:59 -07:00
Junio C Hamano 7745f92507 Merge branch 'js/merge-base-with-missing-commit'
Make sure failure return from merge_bases_many() is properly caught.

* js/merge-base-with-missing-commit:
  merge-ort/merge-recursive: do report errors in `merge_submodule()`
  merge-recursive: prepare for `merge_submodule()` to report errors
  commit-reach(repo_get_merge_bases_many_dirty): pass on errors
  commit-reach(repo_get_merge_bases_many): pass on "missing commits" errors
  commit-reach(get_octopus_merge_bases): pass on "missing commits" errors
  commit-reach(repo_get_merge_bases): pass on "missing commits" errors
  commit-reach(get_merge_bases_many_0): pass on "missing commits" errors
  commit-reach(merge_bases_many): pass on "missing commits" errors
  commit-reach(paint_down_to_common): start reporting errors
  commit-reach(paint_down_to_common): prepare for handling shallow commits
  commit-reach(repo_in_merge_bases_many): report missing commits
  commit-reach(repo_in_merge_bases_many): optionally expect missing commits
  commit-reach(paint_down_to_common): plug two memory leaks
2024-03-11 14:12:30 -07:00
Junio C Hamano 56d6084560 Merge branch 'jk/upload-pack-bounded-resources'
Various parts of upload-pack has been updated to bound the resource
consumption relative to the size of the repository to protect from
abusive clients.

* jk/upload-pack-bounded-resources:
  upload-pack: free tree buffers after parsing
  upload-pack: use PARSE_OBJECT_SKIP_HASH_CHECK in more places
  upload-pack: always turn off save_commit_buffer
  upload-pack: disallow object-info capability by default
  upload-pack: accept only a single packfile-uri line
  upload-pack: use a strmap for want-ref lines
  upload-pack: use oidset for deepen_not list
  upload-pack: switch deepen-not list to an oid_array
  upload-pack: drop separate v2 "haves" array
2024-03-07 15:59:42 -08:00
Junio C Hamano 6a887bdd92 Merge branch 'ml/log-merge-with-cherry-pick-and-other-pseudo-heads'
"git log --merge" learned to pay attention to CHERRY_PICK_HEAD and
other kinds of *_HEAD pseudorefs.

* ml/log-merge-with-cherry-pick-and-other-pseudo-heads:
  revision: implement `git log --merge` also for rebase/cherry-pick/revert
  revision: ensure MERGE_HEAD is a ref in prepare_show_merge
2024-03-07 15:59:41 -08:00
Junio C Hamano 76d1cd8e5e Merge branch 'cc/rev-list-allow-missing-tips'
"git rev-list --missing=print" has learned to optionally take
"--allow-missing-tips", which allows the objects at the starting
points to be missing.

* cc/rev-list-allow-missing-tips:
  revision: fix --missing=[print|allow*] for annotated tags
  rev-list: allow missing tips with --missing=[print|allow*]
  t6022: fix 'test' style and 'even though' typo
  oidset: refactor oidset_insert_from_set()
  revision: clarify a 'return NULL' in get_reference()
2024-03-07 15:59:40 -08:00
Alexander Shopov 781fb7b4c2 revision.c: trivial fix to message
ancestry-path is an option, not a command - mark it as such.
This brings it in sync with the rest of usages in the file

Signed-off-by: Alexander Shopov <ash@kambanaria.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-05 14:11:56 -08:00
Johannes Schindelin 76e2a09999 commit-reach(repo_get_merge_bases): pass on "missing commits" errors
The `merge_bases_many()` function was just taught to indicate parsing
errors, and now the `repo_get_merge_bases()` function (which is also
surfaced via the `repo_get_merge_bases()` macro) is aware of that, too.

Naturally, there are a lot of callers that need to be adjusted now, too.

Next step: adjust the callers of `get_octopus_merge_bases()`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 08:06:01 -08:00
Jeff King 6cd05e768b upload-pack: free tree buffers after parsing
When a client sends us a "want" or "have" line, we call parse_object()
to get an object struct. If the object is a tree, then the parsed state
means that tree->buffer points to the uncompressed contents of the tree.
But we don't really care about it. We only really need to parse commits
and tags; for trees and blobs, the important output is just a "struct
object" with the correct type.

But much worse, we do not ever free that tree buffer. It's not leaked in
the traditional sense, in that we still have a pointer to it from the
global object hash. But if the client requests many trees, we'll hold
all of their contents in memory at the same time.

Nobody really noticed because it's rare for clients to directly request
a tree. It might happen for a lightweight tag pointing straight at a
tree, or it might happen for a "tree:depth" partial clone filling in
missing trees.

But it's also possible for a malicious client to request a lot of trees,
causing upload-pack's memory to balloon. For example, without this
patch, requesting every tree in git.git like:

  pktline() {
    local msg="$*"
    printf "%04x%s\n" $((1+4+${#msg})) "$msg"
  }

  want_trees() {
    pktline command=fetch
    printf 0001
    git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' |
      while read oid type; do
        test "$type" = "tree" || continue
        pktline want $oid
      done
      pktline done
      printf 0000
  }

  want_trees | GIT_PROTOCOL=version=2 valgrind --tool=massif ./git upload-pack . >/dev/null

shows a peak heap usage of ~3.7GB. Which is just about the sum of the
sizes of all of the uncompressed trees. For linux.git, it's closer to
17GB.

So the obvious thing to do is to call free_tree_buffer() after we
realize that we've parsed a tree. We know that upload-pack won't need it
later. But let's push the logic into parse_object_with_flags(), telling
it to discard the tree buffer immediately. There are two reasons for
this. One, all of the relevant call-sites already call the with_options
variant to pass the SKIP_HASH flag. So it actually ends up as less code
than manually free-ing in each spot. And two, it enables an extra
optimization that I'll discuss below.

I've touched all of the sites that currently use SKIP_HASH in
upload-pack. That drops the peak heap of the upload-pack invocation
above from 3.7GB to ~24MB.

I've also modified the caller in get_reference(); a partial clone
benefits from its use in pack-objects for the reasons given in
0bc2557951 (upload-pack: skip parse-object re-hashing of "want" objects,
2022-09-06), where we were measuring blob requests. But note that the
results of get_reference() are used for traversing, as well; so we
really would _eventually_ use the tree contents. That makes this at
first glance a space/time tradeoff: we won't hold all of the trees in
memory at once, but we'll have to reload them each when it comes time to
traverse.

And here's where our extra optimization comes in. If the caller is not
going to immediately look at the tree contents, and it doesn't care
about checking the hash, then parse_object() can simply skip loading the
tree entirely, just like we do for blobs! And now it's not a space/time
tradeoff in get_reference() anymore. It's just a lazy-load: we're
delaying reading the tree contents until it's time to actually traverse
them one by one.

And of course for upload-pack, this optimization means we never load the
trees at all, saving lots of CPU time. Timing the "every tree from
git.git" request above shows upload-pack dropping from 32 seconds of CPU
to 19 (the remainder is mostly due to pack-objects actually sending the
pack; timing just the upload-pack portion shows we go from 13s to
~0.28s).

These are all highly gamed numbers, of course. For real-world
partial-clone requests we're saving only a small bit of time in
practice. But it does help harden upload-pack against malicious
denial-of-service attacks.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28 14:42:01 -08:00
Michael Lohmann f3fc5d9c91 revision: implement git log --merge also for rebase/cherry-pick/revert
'git log' learned in ae3e5e1ef2 (git log -p --merge [[--] paths...],
2006-07-03) to show commits touching conflicted files in the range
HEAD...MERGE_HEAD, an addition documented in d249b45547 (Document
rev-list's option --merge, 2006-08-04).

It can be useful to look at the commit history to understand what lead
to merge conflicts also for other mergy operations besides merges, like
cherry-pick, revert and rebase.

For rebases and cherry-picks, an interesting range to look at is
HEAD...{REBASE_HEAD,CHERRY_PICK_HEAD}, since even if all the commits
included in that range are not directly part of the 3-way merge,
conflicts encountered during these operations can indeed be caused by
changes introduced in preceding commits on both sides of the history.

For revert, as we are (most likely) reversing changes from a previous
commit, an appropriate range is REVERT_HEAD..HEAD, which is equivalent
to REVERT_HEAD...HEAD and to HEAD...REVERT_HEAD, if we keep HEAD and its
parents on the left side of the range.

As such, adjust the code in prepare_show_merge so it constructs the
range HEAD...$OTHER for OTHER={MERGE_HEAD, CHERRY_PICK_HEAD, REVERT_HEAD
or REBASE_HEAD}. Note that we try these pseudorefs in order, so keep
REBASE_HEAD last since the three other operations can be performed
during a rebase. Note also that in the uncommon case where $OTHER and
HEAD do not share a common ancestor, this will show the complete
histories of both sides since their root commits, which is the same
behaviour as currently happens in that case for HEAD and MERGE_HEAD.

Adjust the documentation of this option accordingly.

Co-authored-by: Johannes Sixt <j6t@kdbg.org>
Co-authored-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Michael Lohmann <mi.al.lohmann@gmail.com>
[jc: tweaked in j6t's precedence fix that tries REBASE_HEAD last]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28 10:04:39 -08:00
Michael Lohmann f476143ee6 revision: ensure MERGE_HEAD is a ref in prepare_show_merge
This is done to

 (1) ensure MERGE_HEAD is a ref,
 (2) obtain the oid without any prefixing by refs.c:repo_dwim_ref()
 (3) error out when MERGE_HEAD is a symref.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Lohmann <mi.al.lohmann@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28 10:02:46 -08:00
Christian Couder a4324babe6 revision: fix --missing=[print|allow*] for annotated tags
In 9830926c7d (rev-list: add commit object support in `--missing`
option, 2023-10-27) we fixed the `--missing` option in `git rev-list`
so that it works with missing commits, not just blobs/trees.

Unfortunately, such a command was still failing with a "fatal: bad
object <oid>" if it was passed a missing commit, blob or tree as an
argument (before the rev walking even begins). This was fixed in a
recent commit.

That fix still doesn't work when an argument passed to the command is
an annotated tag pointing to a missing commit though. In that case
`git rev-list --missing=...` still errors out with a "fatal: bad
object <oid>" error where <oid> is the object ID of the missing
commit.

Let's fix this issue, and also, while at it, let's add tests not just
for annotated tags but also for regular tags and branches.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28 09:28:18 -08:00
Patrick Steinhardt 31f898397b refs: drop unused params from the reflog iterator callback
The ref and reflog iterators share much of the same underlying code to
iterate over the corresponding entries. This results in some weird code
because the reflog iterator also exposes an object ID as well as a flag
to the callback function. Neither of these fields do refer to the reflog
though -- they refer to the corresponding ref with the same name. This
is quite misleading. In practice at least the object ID cannot really be
implemented in any other way as a reflog does not have a specific object
ID in the first place. This is further stressed by the fact that none of
the callbacks except for our test helper make use of these fields.

Split up the infrastucture so that ref and reflog iterators use separate
callback signatures. This allows us to drop the nonsensical fields from
the reflog iterator.

Note that internally, the backends still use the same shared infra to
iterate over both types. As the backends should never end up being
called directly anyway, this is not much of a problem and thus kept
as-is for simplicity's sake.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-21 09:58:06 -08:00
Christian Couder 7b644c8c5a rev-list: allow missing tips with --missing=[print|allow*]
In 9830926c7d (rev-list: add commit object support in `--missing`
option, 2023-10-27) we fixed the `--missing` option in `git rev-list`
so that it works with with missing commits, not just blobs/trees.

Unfortunately, such a command would still fail with a "fatal: bad
object <oid>" if it is passed a missing commit, blob or tree as an
argument (before the rev walking even begins).

When such a command is used to find the dependencies of some objects,
for example the dependencies of quarantined objects (see the
"QUARANTINE ENVIRONMENT" section in the git-receive-pack(1)
documentation), it would be better if the command would instead
consider such missing objects, especially commits, in the same way as
other missing objects.

If, for example `--missing=print` is used, it would be nice for some
use cases if the missing tips passed as arguments were reported in
the same way as other missing objects instead of the command just
failing.

We could introduce a new option to make it work like this, but most
users are likely to prefer the command to have this behavior as the
default one. Introducing a new option would require another dumb loop
to look for that option early, which isn't nice.

Also we made `git rev-list` work with missing commits very recently
and the command is most often passed commits as arguments. So let's
consider this as a bug fix related to these recent changes.

While at it let's add a NEEDSWORK comment to say that we should get
rid of the existing ugly dumb loops that parse the
`--exclude-promisor-objects` and `--missing=...` options early.

Helped-by: Linus Arver <linusa@google.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-14 09:39:14 -08:00
Christian Couder 3ff56af99b revision: clarify a 'return NULL' in get_reference()
When we know a pointer variable is NULL, it's clearer to
explicitly return NULL than to return that variable.

In get_reference(), when 'object' is NULL, we already return NULL
when 'revs->exclude_promisor_objects && is_promisor_object(oid)' is
true, but we return 'object' when 'revs->ignore_missing' is true.

Let's make the code clearer and more uniform by also explicitly
returning NULL when 'revs->ignore_missing' is true.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-14 09:38:22 -08:00
Junio C Hamano 0f7a10a3aa Merge branch 'en/header-cleanup' into maint-2.43
Remove unused header "#include".

* en/header-cleanup:
  treewide: remove unnecessary includes in source files
  treewide: add direct includes currently only pulled in transitively
  trace2/tr2_tls.h: remove unnecessary include
  submodule-config.h: remove unnecessary include
  pkt-line.h: remove unnecessary include
  line-log.h: remove unnecessary include
  http.h: remove unnecessary include
  fsmonitor--daemon.h: remove unnecessary includes
  blame.h: remove unnecessary includes
  archive.h: remove unnecessary include
  treewide: remove unnecessary includes in source files
  treewide: remove unnecessary includes from header files
2024-02-08 16:22:10 -08:00
Junio C Hamano 6479e121c2 Merge branch 'rs/incompatible-options-messages' into maint-2.43
Clean-up code that handles combinations of incompatible options.

* rs/incompatible-options-messages:
  worktree: simplify incompatibility message for --orphan and commit-ish
  worktree: standardize incompatibility messages
  clean: factorize incompatibility message
  revision, rev-parse: factorize incompatibility messages about - -exclude-hidden
  revision: use die_for_incompatible_opt3() for - -graph/--reverse/--walk-reflogs
  repack: use die_for_incompatible_opt3() for -A/-k/--cruft
  push: use die_for_incompatible_opt4() for - -delete/--tags/--all/--mirror
2024-02-08 16:22:06 -08:00
Junio C Hamano 25e2039cf6 Merge branch 'rs/i18n-cannot-be-used-together' into maint-2.43
Clean-up code that handles combinations of incompatible options.

* rs/i18n-cannot-be-used-together:
  i18n: factorize even more 'incompatible options' messages
2024-02-08 16:22:05 -08:00
Junio C Hamano 4b50f86141 Merge branch 'jc/revision-parse-int' into maint-2.43
The command line parser for the "log" family of commands was too
loose when parsing certain numbers, e.g., silently ignoring the
extra 'q' in "git log -n 1q" without complaining, which has been
tightened up.

* jc/revision-parse-int:
  revision: parse integer arguments to --max-count, --skip, etc., more carefully
2024-02-08 16:22:02 -08:00
Junio C Hamano 492ee03f60 Merge branch 'en/header-cleanup'
Remove unused header "#include".

* en/header-cleanup:
  treewide: remove unnecessary includes in source files
  treewide: add direct includes currently only pulled in transitively
  trace2/tr2_tls.h: remove unnecessary include
  submodule-config.h: remove unnecessary include
  pkt-line.h: remove unnecessary include
  line-log.h: remove unnecessary include
  http.h: remove unnecessary include
  fsmonitor--daemon.h: remove unnecessary includes
  blame.h: remove unnecessary includes
  archive.h: remove unnecessary include
  treewide: remove unnecessary includes in source files
  treewide: remove unnecessary includes from header files
2024-01-08 14:05:15 -08:00
Elijah Newren eea0e59ffb treewide: remove unnecessary includes in source files
Each of these were checked with
   gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE}
to ensure that removing the direct inclusion of the header actually
resulted in that header no longer being included at all (i.e. that
no other header pulled it in transitively).

...except for a few cases where we verified that although the header
was brought in transitively, nothing from it was directly used in
that source file.  These cases were:
  * builtin/credential-cache.c
  * builtin/pull.c
  * builtin/send-pack.c

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-26 12:04:31 -08:00
Junio C Hamano 3c8f932d35 Merge branch 'rs/incompatible-options-messages'
Clean-up code that handles combinations of incompatible options.

* rs/incompatible-options-messages:
  worktree: simplify incompatibility message for --orphan and commit-ish
  worktree: standardize incompatibility messages
  clean: factorize incompatibility message
  revision, rev-parse: factorize incompatibility messages about - -exclude-hidden
  revision: use die_for_incompatible_opt3() for - -graph/--reverse/--walk-reflogs
  repack: use die_for_incompatible_opt3() for -A/-k/--cruft
  push: use die_for_incompatible_opt4() for - -delete/--tags/--all/--mirror
2023-12-20 10:14:53 -08:00
Junio C Hamano c4bf868bee Merge branch 'jc/revision-parse-int'
The command line parser for the "log" family of commands was too
loose when parsing certain numbers, e.g., silently ignoring the
extra 'q' in "git log -n 1q" without complaining, which has been
tightened up.

* jc/revision-parse-int:
  revision: parse integer arguments to --max-count, --skip, etc., more carefully
2023-12-20 10:14:53 -08:00
Junio C Hamano 71a1e94821 revision: parse integer arguments to --max-count, --skip, etc., more carefully
The "rev-list" and other commands in the "log" family, being the
oldest part of the system, use their own custom argument parsers,
and integer values of some options are parsed with atoi(), which
allows a non-digit after the number (e.g., "1q") to be silently
ignored.  As a natural consequence, an argument that does not begin
with a digit (e.g., "q") silently becomes zero, too.

Switch to use strtol_i() and parse_timestamp() appropriately to
catch bogus input.

Note that one may naïvely expect that --max-count, --skip, etc., to
only take non-negative values, but we must allow them to also take
negative values, as an escape hatch to countermand a limit set by an
earlier option on the command line; the underlying variables are
initialized to (-1) and "--max-count=-1", for example, is a
legitimate way to reinitialize the limit.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-09 08:57:31 +09:00
René Scharfe 81fb70f55e revision, rev-parse: factorize incompatibility messages about - -exclude-hidden
Use the standard parameterized message for reporting incompatible
options to report options that are not accepted in combination with
--exclude-hidden.  This reduces the number of strings to translate and
makes the UI a bit more consistent.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-09 07:41:03 +09:00
René Scharfe fa518aef56 revision: use die_for_incompatible_opt3() for - -graph/--reverse/--walk-reflogs
The revision option --reverse is incompatible with --walk-reflogs and
--graph is incompatible with both --reverse and --walk-reflogs.  So they
are all incompatible with each other.

Use the function for checking three mutually incompatible options,
die_for_incompatible_opt3(), to perform this check in one place and
without repetition.  This is shorter and clearer.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-09 07:40:44 +09:00
René Scharfe 7854bf4960 i18n: factorize even more 'incompatible options' messages
Continue the work of 12909b6b8a (i18n: turn "options are incompatible"
into "cannot be used together", 2022-01-05) and a699367bb8 (i18n:
factorize more 'incompatible options' messages, 2022-01-31) to use the
same parameterized error message for reporting incompatible command line
options.  This reduces the number of strings to translate and makes the
UI slightly more consistent.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-27 10:01:45 +09:00
Karthik Nayak 9830926c7d rev-list: add commit object support in --missing option
The `--missing` object option in rev-list currently works only with
missing blobs/trees. For missing commits the revision walker fails with
a fatal error.

Let's extend the functionality of `--missing` option to also support
commit objects. This is done by adding a `missing_objects` field to
`rev_info`. This field is an `oidset` to which we'll add the missing
commits as we encounter them. The revision walker will now continue the
traversal and call `show_commit()` even for missing commits. In rev-list
we can then check if the commit is a missing commit and call the
existing code for parsing `--missing` objects.

A scenario where this option would be used is to find the boundary
objects between different object directories. Consider a repository with
a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate
object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a
repository, using the `--missing=print` option while disabling the
alternate object directory allows us to find the boundary objects
between the main and alternate object directory.

Helped-by: Patrick Steinhardt <ps@pks.im>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-11-01 12:07:18 +09:00
Junio C Hamano 2920971a7f Merge branch 'jk/decoration-and-other-leak-fixes'
Leakfix.

* jk/decoration-and-other-leak-fixes:
  daemon: free listen_addr before returning
  revision: clear decoration structs during release_revisions()
  decorate: add clear_decoration() function
2023-10-13 14:18:28 -07:00
Jeff King 8ef8da4842 revision: clear decoration structs during release_revisions()
The point of release_revisions() is to free memory associated with the
rev_info struct, but we have several "struct decoration" members that
are left untouched. Since the previous commit introduced a function to
do that, we can just call it.

We do have to provide some specialized callbacks to map the void
pointers onto real ones (the alternative would be casting the existing
function pointers; this generally works because "void *" is usually
interchangeable with a struct pointer, but it is technically forbidden
by the standard).

Since the line-log code does not expose the type it stores in the
decoration (nor of course the function to free it), I put this behind a
generic line_log_free() entry point. It's possible we may need to add
more line-log specific bits anyway (running t4211 shows a number of
other leaks in the line-log code).

While this doubtless cleans up many leaks triggered by the test suite,
the only script which becomes leak-free is t4217, as it does very little
beyond a simple traversal (its existing leak was from the use of
--children, which is now fixed).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-05 14:54:57 -07:00
Junio C Hamano 3029189186 Merge branch 'ps/revision-cmdline-stdin-not'
"git rev-list --stdin" learned to take non-revisions (like "--not")
recently from the standard input, but the way such a "--not" was
handled was quite confusing, which has been rethought.  This is
potentially a change that breaks backward compatibility.

* ps/revision-cmdline-stdin-not:
  revision: make pseudo-opt flags read via stdin behave consistently
2023-10-04 13:28:52 -07:00
Eric W. Biederman efed687edc tree-walk: init_tree_desc take an oid to get the hash algorithm
To make it possible for git ls-tree to display the tree encoded
in the hash algorithm of the oid specified to git ls-tree, update
init_tree_desc to take as a parameter the oid of the tree object.

Update all callers of init_tree_desc and init_tree_desc_gently
to pass the oid of the tree object.

Use the oid of the tree object to discover the hash algorithm
of the oid and store that hash algorithm in struct tree_desc.

Use the hash algorithm in decode_tree_entry and
update_tree_entry_internal to handle reading a tree object encoded in
a hash algorithm that differs from the repositories hash algorithm.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-02 14:57:40 -07:00