Commit graph

10628 commits

Author SHA1 Message Date
Junio C Hamano 80bdaba894 messages: mark some strings with "up-to-date" not to touch
The treewide clean-up of "up-to-date" strings done in 7560f547
(treewide: correct several "up-to-date" to "up to date", 2017-08-23)
deliberately left some out, but unlike the lines that were changed
by the commit, the lines that were deliberately left untouched by
the commit is impossible to ask "git blame" to link back to the
commit that did not touch them.

Let's do the second best thing, leave a short comment near them
explaining why those strings should not be modified or localized.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
[es: make in-code comment more developer-friendly]
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-12 10:20:01 -08:00
Johannes Schindelin 6487e9c459 Sync with 2.37.6
* maint-2.37:
  Git 2.37.6
  Git 2.36.5
  Git 2.35.7
  Git 2.34.7
  http: support CURLOPT_PROTOCOLS_STR
  http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
  http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT
  Git 2.33.7
  Git 2.32.6
  Git 2.31.7
  Git 2.30.8
  apply: fix writing behind newly created symbolic links
  dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
  clone: delay picking a transport until after get_repo_path()
  t5619: demonstrate clone_local() with ambiguous transport
2023-02-06 09:43:28 +01:00
Johannes Schindelin 16004682f9 Sync with 2.36.5
* maint-2.36:
  Git 2.36.5
  Git 2.35.7
  Git 2.34.7
  http: support CURLOPT_PROTOCOLS_STR
  http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
  http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT
  Git 2.33.7
  Git 2.32.6
  Git 2.31.7
  Git 2.30.8
  apply: fix writing behind newly created symbolic links
  dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
  clone: delay picking a transport until after get_repo_path()
  t5619: demonstrate clone_local() with ambiguous transport
2023-02-06 09:38:31 +01:00
Johannes Schindelin 40843216c5 Sync with 2.35.7
* maint-2.35:
  Git 2.35.7
  Git 2.34.7
  http: support CURLOPT_PROTOCOLS_STR
  http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
  http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT
  Git 2.33.7
  Git 2.32.6
  Git 2.31.7
  Git 2.30.8
  apply: fix writing behind newly created symbolic links
  dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
  clone: delay picking a transport until after get_repo_path()
  t5619: demonstrate clone_local() with ambiguous transport
2023-02-06 09:37:52 +01:00
Johannes Schindelin 6a53a59bf9 Sync with 2.34.7
* maint-2.34:
  Git 2.34.7
  http: support CURLOPT_PROTOCOLS_STR
  http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
  http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT
  Git 2.33.7
  Git 2.32.6
  Git 2.31.7
  Git 2.30.8
  apply: fix writing behind newly created symbolic links
  dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
  clone: delay picking a transport until after get_repo_path()
  t5619: demonstrate clone_local() with ambiguous transport
2023-02-06 09:29:44 +01:00
Johannes Schindelin a7237f5ae9 Sync with 2.33.7
* maint-2.33:
  Git 2.33.7
  Git 2.32.6
  Git 2.31.7
  Git 2.30.8
  apply: fix writing behind newly created symbolic links
  dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
  clone: delay picking a transport until after get_repo_path()
  t5619: demonstrate clone_local() with ambiguous transport
2023-02-06 09:29:16 +01:00
Johannes Schindelin 87248c5933 Sync with 2.32.6
* maint-2.32:
  Git 2.32.6
  Git 2.31.7
  Git 2.30.8
  apply: fix writing behind newly created symbolic links
  dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
  clone: delay picking a transport until after get_repo_path()
  t5619: demonstrate clone_local() with ambiguous transport
2023-02-06 09:25:56 +01:00
Johannes Schindelin aeb93d7da2 Sync with 2.31.7
* maint-2.31:
  Git 2.31.7
  Git 2.30.8
  apply: fix writing behind newly created symbolic links
  dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
  clone: delay picking a transport until after get_repo_path()
  t5619: demonstrate clone_local() with ambiguous transport
2023-02-06 09:25:08 +01:00
Johannes Schindelin e14d6b8408 Sync with 2.30.8
* maint-2.30:
  Git 2.30.8
  apply: fix writing behind newly created symbolic links
  dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS
  clone: delay picking a transport until after get_repo_path()
  t5619: demonstrate clone_local() with ambiguous transport
2023-02-06 09:24:06 +01:00
Taylor Blau cf8f6ce02a clone: delay picking a transport until after get_repo_path()
In the previous commit, t5619 demonstrates an issue where two calls to
`get_repo_path()` could trick Git into using its local clone mechanism
in conjunction with a non-local transport.

That sequence is:

 - the starting state is that the local path https:/example.com/foo is a
   symlink that points to ../../../.git/modules/foo. So it's dangling.

 - get_repo_path() sees that no such path exists (because it's
   dangling), and thus we do not canonicalize it into an absolute path

 - because we're using --separate-git-dir, we create .git/modules/foo.
   Now our symlink is no longer dangling!

 - we pass the url to transport_get(), which sees it as an https URL.

 - we call get_repo_path() again, on the url. This second call was
   introduced by f38aa83f9a (use local cloning if insteadOf makes a
   local URL, 2014-07-17). The idea is that we want to pull the url
   fresh from the remote.c API, because it will apply any aliases.

And of course now it sees that there is a local file, which is a
mismatch with the transport we already selected.

The issue in the above sequence is calling `transport_get()` before
deciding whether or not the repository is indeed local, and not passing
in an absolute path if it is local.

This is reminiscent of a similar bug report in [1], where it was
suggested to perform the `insteadOf` lookup earlier. Taking that
approach may not be as straightforward, since the intent is to store the
original URL in the config, but to actually fetch from the insteadOf
one, so conflating the two early on is a non-starter.

Note: we pass the path returned by `get_repo_path(remote->url[0])`,
which should be the same as `repo_name` (aside from any `insteadOf`
rewrites).

We *could* pass `absolute_pathdup()` of the same argument, which
86521acaca (Bring local clone's origin URL in line with that of a remote
clone, 2008-09-01) indicates may differ depending on the presence of
".git/" for a non-bare repo. That matters for forming relative submodule
paths, but doesn't matter for the second call, since we're just feeding
it to the transport code, which is fine either way.

[1]: https://lore.kernel.org/git/CAMoD=Bi41mB3QRn3JdZL-FGHs4w3C2jGpnJB-CqSndO7FMtfzA@mail.gmail.com/

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-24 16:52:16 -08:00
Junio C Hamano 64de207727 Merge branch 'rj/branch-edit-desc-unborn' into maint-2.38
"git branch --edit-description" on an unborh branch misleadingly
said that no such branch exists, which has been corrected.

* rj/branch-edit-desc-unborn:
  branch: description for non-existent branch errors
2022-10-27 15:24:13 -07:00
Junio C Hamano 1f49b5171a Merge branch 'jk/cleanup-callback-parameters' into maint-2.38
Code clean-up.

* jk/cleanup-callback-parameters:
  attr: drop DEBUG_ATTR code
  commit: avoid writing to global in option callback
  multi-pack-index: avoid writing to global in option callback
  test-submodule: inline resolve_relative_url() function
2022-10-25 17:11:39 -07:00
Junio C Hamano 28f9cd0d5f Merge branch 'rs/gc-pack-refs-simplify' into maint-2.38
Code clean-up.

* rs/gc-pack-refs-simplify:
  gc: simplify maintenance_task_pack_refs()
2022-10-25 17:11:39 -07:00
Junio C Hamano 3ae0094a91 Merge branch 'rs/bisect-start-leakfix' into maint-2.38
Code clean-up that results in plugging a leak.

* rs/bisect-start-leakfix:
  bisect--helper: plug strvec leak
2022-10-25 17:11:37 -07:00
Junio C Hamano 1155c8efbb Merge branch 'jc/branch-description-unset' into maint-2.38
"GIT_EDITOR=: git branch --edit-description" resulted in failure,
which has been corrected.

* jc/branch-description-unset:
  branch: do not fail a no-op --edit-desc
2022-10-25 17:11:37 -07:00
Junio C Hamano cf96b393d6 Merge branch 'jk/fsck-on-diet' into maint-2.38
"git fsck" failed to release contents of tree objects already used
from the memory, which has been fixed.

* jk/fsck-on-diet:
  parse_object_buffer(): respect save_commit_buffer
  fsck: turn off save_commit_buffer
  fsck: free tree buffers after walking unreachable objects
2022-10-25 17:11:33 -07:00
Junio C Hamano 1655ac884a Merge branch 'ah/fsmonitor-daemon-usage-non-l10n' into maint-2.38
Fix messages incorrectly marked for translation.

* ah/fsmonitor-daemon-usage-non-l10n:
  fsmonitor--daemon: don't translate literal commands
2022-10-25 17:11:33 -07:00
Junio C Hamano 0d5d92906a Merge branch 'jk/clone-allow-bare-and-o-together' into maint-2.38
"git clone" did not like to see the "--bare" and the "--origin"
options used together without a good reason.

* jk/clone-allow-bare-and-o-together:
  clone: allow "--bare" with "-o"
2022-10-25 17:11:33 -07:00
Junio C Hamano 665d7e08b4 Merge branch 'jk/remote-rename-without-fetch-refspec' into maint-2.38
"git remote rename" failed to rename a remote without fetch
refspec, which has been corrected.

* jk/remote-rename-without-fetch-refspec:
  remote: handle rename of remote without fetch refspec
2022-10-25 17:11:32 -07:00
Rubén Justo bcfc82bd48 branch: description for non-existent branch errors
When the repository does not yet have commits, some errors describe that
there is no branch:

    $ git init -b first

    $ git branch --edit-description first
    error: No branch named 'first'.

    $ git branch --set-upstream-to=upstream
    fatal: branch 'first' does not exist

    $ git branch -c second
    error: refname refs/heads/first not found
    fatal: Branch copy failed

That "first" branch is unborn but to say it doesn't exists is confusing.

Options "-c" (copy) and "-m" (rename) show the same error when the
origin branch doesn't exists:

    $ git branch -c non-existent-branch second
    error: refname refs/heads/non-existent-branch not found
    fatal: Branch copy failed

    $ git branch -m non-existent-branch second
    error: refname refs/heads/non-existent-branch not found
    fatal: Branch rename failed

Note that "--edit-description" without an explicit argument is already
considering the _empty repository_ circumstance in its error.  Also note
that "-m" on the initial branch it is an allowed operation.

Make the error descriptions for those branch operations with unborn or
non-existent branches, more informative.

This is the result of the change:

    $ git init -b first

    $ git branch --edit-description first
    error: No commit on branch 'first' yet.

    $ git branch --set-upstream-to=upstream
    fatal: No commit on branch 'first' yet.

    $ git branch -c second
    fatal: No commit on branch 'first' yet.

    $ git branch [-c/-m] non-existent-branch second
    fatal: No branch named 'non-existent-branch'.

Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-07 20:59:41 -07:00
René Scharfe 246526d019 bisect--helper: plug strvec leak
The strvec "argv" is used to build a command for run_command_v_opt(),
but never freed.  Use a constant string array instead, which doesn't
require any cleanup.

Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-07 10:21:18 -07:00
Taylor Blau f64d4ca8d6 Sync with 2.37.4
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-06 20:00:04 -04:00
Taylor Blau f2798aa404 Sync with 2.36.3
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-06 19:58:16 -04:00
Taylor Blau 58612f82b6 Sync with 2.35.5
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-06 17:44:44 -04:00
Taylor Blau ac8a1db867 Sync with 2.34.5
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-06 17:43:37 -04:00
Taylor Blau 478a426f14 Sync with 2.33.5
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-06 17:42:55 -04:00
Taylor Blau 3957f3c84e Sync with 2.32.4
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-06 17:42:02 -04:00
Taylor Blau 9cbd2827c5 Sync with 2.31.5
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-06 17:40:44 -04:00
Taylor Blau 122512967e Sync with 2.30.6
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-06 17:39:15 -04:00
Jeff King 116761ba9c commit: avoid writing to global in option callback
The callback function for --trailer writes directly to the global
trailer_args and ignores opt->value completely. This is OK, since that's
where we expect to find the value. But it does mean the option
declaration isn't as clear. E.g., we have:

    OPT_BOOL(0, "reset-author", &renew_authorship, ...),
    OPT_CALLBACK_F(0, "trailer", NULL, ..., opt_pass_trailer)

In the first one we can see where the result will be stored, but in the
second, we get only NULL, and you have to go read the callback.

Let's pass &trailer_args, and use it in the callback. As a bonus, this
silences a -Wunused-parameter warning.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-06 09:58:06 -07:00
Jeff King 7faba18a9a multi-pack-index: avoid writing to global in option callback
We declare the --object-dir option like:

  OPT_CALLBACK(0, "object-dir", &opts.object_dir, ...);

but the pointer to opts.object_dir is completely unused. Instead, the
callback writes directly to a global. Which fortunately happens to be
opts.object_dir. So everything works as expected, but it's unnecessarily
confusing.

Instead, let's have the callback write to the option value pointer that
has been passed in. This also quiets a -Wunused-parameter warning (since
we don't otherwise look at "opt").

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-06 09:56:51 -07:00
René Scharfe b004c90282 gc: simplify maintenance_task_pack_refs()
Pass a constant string array directly to run_command_v_opt() instead of
copying it into a strvec first.  This shortens the code and avoids heap
allocations.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-05 12:46:27 -07:00
Taylor Blau 6f054f9fb3 builtin/clone.c: disallow --local clones with symlinks
When cloning a repository with `--local`, Git relies on either making a
hardlink or copy to every file in the "objects" directory of the source
repository. This is done through the callpath `cmd_clone()` ->
`clone_local()` -> `copy_or_link_directory()`.

The way this optimization works is by enumerating every file and
directory recursively in the source repository's `$GIT_DIR/objects`
directory, and then either making a copy or hardlink of each file. The
only exception to this rule is when copying the "alternates" file, in
which case paths are rewritten to be absolute before writing a new
"alternates" file in the destination repo.

One quirk of this implementation is that it dereferences symlinks when
cloning. This behavior was most recently modified in 36596fd2df (clone:
better handle symlinked files at .git/objects/, 2019-07-10), which
attempted to support `--local` clones of repositories with symlinks in
their objects directory in a platform-independent way.

Unfortunately, this behavior of dereferencing symlinks (that is,
creating a hardlink or copy of the source's link target in the
destination repository) can be used as a component in attacking a
victim by inadvertently exposing the contents of file stored outside of
the repository.

Take, for example, a repository that stores a Dockerfile and is used to
build Docker images. When building an image, Docker copies the directory
contents into the VM, and then instructs the VM to execute the
Dockerfile at the root of the copied directory. This protects against
directory traversal attacks by copying symbolic links as-is without
dereferencing them.

That is, if a user has a symlink pointing at their private key material
(where the symlink is present in the same directory as the Dockerfile,
but the key itself is present outside of that directory), the key is
unreadable to a Docker image, since the link will appear broken from the
container's point of view.

This behavior enables an attack whereby a victim is convinced to clone a
repository containing an embedded submodule (with a URL like
"file:///proc/self/cwd/path/to/submodule") which has a symlink pointing
at a path containing sensitive information on the victim's machine. If a
user is tricked into doing this, the contents at the destination of
those symbolic links are exposed to the Docker image at runtime.

One approach to preventing this behavior is to recreate symlinks in the
destination repository. But this is problematic, since symlinking the
objects directory are not well-supported. (One potential problem is that
when sharing, e.g. a "pack" directory via symlinks, different writers
performing garbage collection may consider different sets of objects to
be reachable, enabling a situation whereby garbage collecting one
repository may remove reachable objects in another repository).

Instead, prohibit the local clone optimization when any symlinks are
present in the `$GIT_DIR/objects` directory of the source repository.
Users may clone the repository again by prepending the "file://" scheme
to their clone URL, or by adding the `--no-local` option to their `git
clone` invocation.

The directory iterator used by `copy_or_link_directory()` must no longer
dereference symlinks (i.e., it *must* call `lstat()` instead of `stat()`
in order to discover whether or not there are symlinks present). This has
no bearing on the overall behavior, since we will immediately `die()` on
encounter a symlink.

Note that t5604.33 suggests that we do support local clones with
symbolic links in the source repository's objects directory, but this
was likely unintentional, or at least did not take into consideration
the problem with sharing parts of the objects directory with symbolic
links at the time. Update this test to reflect which options are and
aren't supported.

Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-01 00:23:38 -04:00
Junio C Hamano e288b3de35 branch: do not fail a no-op --edit-desc
Imagine running "git branch --edit-description" while on a branch
without the branch description, and then exit the editor after
emptying the edit buffer, which is the way to tell the command that
you changed your mind and you do not want the description after all.

The command should just happily oblige, adding no branch description
for the current branch, and exit successfully.  But it fails to do
so:

    $ git init -b main
    $ git commit --allow-empty -m commit
    $ GIT_EDITOR=: git branch --edit-description
    fatal: could not unset 'branch.main.description'

The end result is OK in that the configuration variable does not
exist in the resulting repository, but we should do better.  If we
know we didn't have a description, and if we are asked not to have a
description by the editor, we can just return doing nothing.

This of course introduces TOCTOU.  If you add a branch description
to the same branch from another window, while you had the editor
open to edit the description, and then exit the editor without
writing anything there, we'd end up not removing the description you
added in the other window.  But you are fooling yourself in your own
repository at that point, and if it hurts, you'd be better off not
doing so ;-).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-30 11:13:51 -07:00
Jeff King 5a97b38109 remote: handle rename of remote without fetch refspec
We return an error when trying to rename a remote that has no fetch
refspec:

  $ git config --unset-all remote.origin.fetch
  $ git remote rename origin foo
  fatal: could not unset 'remote.foo.fetch'

To make things even more confusing, we actually _do_ complete the config
modification, via git_config_rename_section(). After that we try to
rewrite the fetch refspec (to say refs/remotes/foo instead of origin).
But our call to git_config_set_multivar() to remove the existing entries
fails, since there aren't any, and it calls die().

We could fix this by using the "gently" form of the config call, and
checking the error code. But there is an even simpler fix: if we know
that there are no refspecs to rewrite, then we can skip that part
entirely.

Reported-by: John A. Leuenhagen <john@zlima12.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-22 12:59:52 -07:00
Jeff King 3b910d6e29 clone: allow "--bare" with "-o"
We explicitly forbid the combination of "--bare" with "-o", but there
doesn't seem to be any good reason to do so. The original logic came as
part of e6489a1bdf (clone: do not accept more than one -o option.,
2006-01-22), but that commit does not give any reason.

Furthermore, the equivalent combination via config is allowed:

  git -c clone.defaultRemoteName=foo clone ...

and works as expected. It may be that this combination was considered
useless, because a bare clone does not set remote.origin.fetch (and
hence there is no refs/remotes/origin hierarchy). But it does set
remote.origin.url, and that name is visible to the user via "git fetch
origin", etc.

Let's allow the options to be used together, and switch the "forbid"
test in t5606 to check that we use the requested name. That test came
much later in 349cff76de (clone: add tests for --template and some
disallowed option pairs, 2020-09-29), and does not offer any logic
beyond "let's test what the code currently does".

Reported-by: John A. Leuenhagen <john@zlima12.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-22 12:57:03 -07:00
Jeff King 51b27747e5 parse_object_buffer(): respect save_commit_buffer
If the global variable "save_commit_buffer" is set to 0, then
parse_commit() will throw away the commit object data after parsing it,
rather than sticking it into a commit slab. This goes all the way back
to 60ab26de99 ([PATCH] Avoid wasting memory in git-rev-list,
2005-09-15).

But there's another code path which may similarly stash the buffer:
parse_object_buffer(). This is where we end up if we parse a commit via
parse_object(), and it's used directly in a few other code paths like
git-fsck.

The original goal of 60ab26de99 was avoiding extra memory usage for
rev-list. And there it's not all that important to catch parse_object().
We use that function only for looking at the tips of the traversal, and
the majority of the commits are parsed by following parent links, where
we use parse_commit() directly. So we were wasting some memory, but only
a small portion.

It's much easier to see the effect with fsck. Since we now turn off
save_commit_buffer by default there, we _should_ be able to drop the
freeing of the commit buffer in fsck_obj(). But if we do so (taking the
first hunk of this patch without the rest), then the peak heap of "git
fsck" in a clone of git.git goes from 136MB to 194MB. Teaching
parse_object_buffer() to respect save_commit_buffer brings that down to
134.5MB (it's hard to tell from massif's output, but I suspect the
savings comes from avoiding the overhead of the mostly-empty commit
slab).

Other programs should see a small improvement. Both "rev-list --all" and
"fsck --connectivity-only" improve by a few hundred kilobytes, as they'd
avoid loading the tip objects of their traversals.

Most importantly, no code should be hurt by doing this. Any program that
turns off save_commit_buffer is already making the assumption that any
commit it sees may need to have its object data loaded on demand, as it
doesn't know which ones were parsed by parse_commit() versus
parse_object(). Not to mention that anything parsed by the commit graph
may be in the same boat, even if save_commit_buffer was not disabled.

This should be the only spot that needs to be fixed. Grepping for
set_commit_buffer() shows that this and parse_commit() are the only
relevant calls.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-22 11:40:47 -07:00
Jeff King 069e445256 fsck: turn off save_commit_buffer
When parsing a commit, the default behavior is to stuff the original
buffer into a commit_slab (which takes ownership of it). But for a tool
like fsck, this isn't useful. While we may look at the buffer further as
part of fsck_commit(), we'll always do so through a separate pointer;
attaching the buffer to the slab doesn't help.

Worse, it means we have to remember to free the commit buffer in all
call paths. We do so in fsck_obj(), which covers a regular "git fsck".
But with "--connectivity-only", we forget to do so in both
traverse_one_object(), which covers reachable objects, and
mark_unreachable_referents(), which covers unreachable ones. As a
result, that mode ends up storing an uncompressed copy of every commit
on the heap at once.

We could teach the code paths for --connectivity-only to also free
commit buffers. But there's an even easier fix: we can just turn off the
save_commit_buffer flag, and then we won't attach them to the commits in
the first place.

This reduces the peak heap of running "git fsck --connectivity-only" in
a clone of linux.git from ~2GB to ~1GB. According to massif, the
remaining memory goes where you'd expect: the object structs themselves,
the obj_hash containing them, and the delta base cache.

Note that we'll leave the call to free commit buffers in fsck_obj() for
now; it's not quite redundant because of a related bug that we'll fix in
a subsequent commit.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-22 11:40:11 -07:00
Jeff King fbce4fa9ae fsck: free tree buffers after walking unreachable objects
After calling fsck_walk(), a tree object struct may be left in the
parsed state, with the full tree contents available via tree->buffer.
It's the responsibility of the caller to free these when it's done with
the object to avoid having many trees allocated at once.

In a regular "git fsck", we hit fsck_walk() only from fsck_obj(), which
does call free_tree_buffer(). Likewise for "--connectivity-only", we see
most objects via traverse_one_object(), which makes a similar call.

The exception is in mark_unreachable_referents(). When using both
"--connectivity-only" and "--dangling" (the latter of which is the
default), we walk all of the unreachable objects, and there we forget to
free. Most cases would not notice this, because they don't have a lot of
unreachable objects, but you can make a pathological case like this:

  git clone --bare /path/to/linux.git repo.git
  cd repo.git
  rm packed-refs ;# now everything is unreachable!
  git fsck --connectivity-only

That ends up with peak heap usage ~18GB, which is (not coincidentally)
close to the size of all uncompressed trees in the repository. After
this patch, the peak heap is only ~2GB.

A few things to note:

  - it might seem like fsck_walk(), if it is parsing the trees, should
    be responsible for freeing them. But the situation is quite tricky.
    In the non-connectivity mode, after we call fsck_walk() we then
    proceed with fsck_object() which actually does the type-specific
    sanity checks on the object contents. We do pass our own separate
    buffer to fsck_object(), but there's a catch: our earlier call to
    parse_object_buffer() may have attached that buffer to the object
    struct! So by freeing it, we leave the rest of the code with a
    dangling pointer.

    Likewise, the call to fsck_walk() in index-pack is subtle. It
    attaches a buffer to the tree object that must not be freed! And
    so rather than calling free_tree_buffer(), it actually detaches it
    by setting tree->buffer to NULL.

    These cases would _probably_ be fixable by having fsck_walk() free
    the tree buffer only when it was the one who allocated it via
    parse_tree(). But that would still leave the callers responsible for
    freeing other cases, so they wouldn't be simplified. While the
    current semantics for fsck_walk() make it easy to accidentally leak
    in new callers, at least they are simple to explain, and it's not a
    function that's likely to get a lot of new call-sites.

    And in any case, it's probably sensible to fix the leak first with
    this simple patch, and try any more complicated refactoring
    separately.

  - a careful reader may notice that fsck_obj() also frees commit
    buffers, but neither the call in traverse_one_object() nor the one
    touched in this patch does so. And indeed, this is another problem
    for --connectivity-only (and accounts for most of the 2GB heap after
    this patch), but it's one we'll fix in a separate commit.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-22 11:30:06 -07:00
Junio C Hamano 04cc66fe8c Merge branch 'sg/parse-options-subcommand'
Fix messages incorrectly marked for translation.

* sg/parse-options-subcommand:
  gc: don't translate literal commands
2022-09-21 15:27:03 -07:00
Junio C Hamano 86c108a8a2 Merge branch 'vd/scalar-generalize-diagnose'
Portability fix.

* vd/scalar-generalize-diagnose:
  builtin/diagnose.c: don't translate the two mode values
  diagnose.c: refactor to safely use 'd_type'
2022-09-21 15:27:01 -07:00
Alex Henrie 02cb8b9ee3 fsmonitor--daemon: don't translate literal commands
These commands have no placeholders to be translated.

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-21 11:56:42 -07:00
Alex Henrie d956fa8082 builtin/diagnose.c: don't translate the two mode values
These strings are not translatable in the diagnose_options array in
diagnose.c. Don't translate them in builtin/diagnose.c either.

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-21 11:53:35 -07:00
Alex Henrie 8b74492135 gc: don't translate literal commands
The command you type is still "git maintenance" even in other languages.

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-21 10:43:10 -07:00
Junio C Hamano 42bf77c7d0 Merge branch 'vd/scalar-to-main'
Hoist the remainder of "scalar" out of contrib/ to the main part of
the codebase.

* vd/scalar-to-main:
  Documentation/technical: include Scalar technical doc
  t/perf: add 'GIT_PERF_USE_SCALAR' run option
  t/perf: add Scalar performance tests
  scalar-clone: add test coverage
  scalar: add to 'git help -a' command list
  scalar: implement the `help` subcommand
  git help: special-case `scalar`
  scalar: include in standard Git build & installation
  scalar: fix command documentation section header
2022-09-19 14:35:25 -07:00
Junio C Hamano 298a958224 Merge branch 'jk/list-objects-filter-cleanup'
A couple of bugfixes with code clean-up.

* jk/list-objects-filter-cleanup:
  list-objects-filter: convert filter_spec to a strbuf
  list-objects-filter: add and use initializers
  list-objects-filter: handle null default filter spec
  list-objects-filter: don't memset after releasing filter struct
2022-09-19 14:35:24 -07:00
Junio C Hamano f876b5a686 Merge branch 'zh/ls-files-format'
Typofix in the UI of a topic that has graduated to 'master'.

* zh/ls-files-format:
  ls-files: fix black space in error message
2022-09-19 14:35:24 -07:00
Junio C Hamano 339517b035 Merge branch 'sy/mv-out-of-cone'
"git mv A B" in a sparsely populated working tree can be asked to
move a path from a directory that is "in cone" to another directory
that is "out of cone".  Handling of such a case has been improved.

* sy/mv-out-of-cone:
  builtin/mv.c: fix possible segfault in add_slash()
  mv: check overwrite for in-to-out move
  advice.h: add advise_on_moving_dirty_path()
  mv: cleanup empty WORKING_DIRECTORY
  mv: from in-cone to out-of-cone
  mv: remove BOTH from enum update_mode
  mv: check if <destination> is a SKIP_WORKTREE_DIR
  mv: free the with_slash in check_dir_in_index()
  mv: rename check_dir_in_index() to empty_dir_has_sparse_contents()
  t7002: add tests for moving from in-cone to out-of-cone
2022-09-19 14:35:23 -07:00
Junio C Hamano ca20a44bc5 Merge branch 'jk/proto-v2-ref-prefix-fix'
"git fetch" over protocol v2 sent an incorrect ref prefix request
to the server and made "git pull" with configured fetch refspec
that does not cover the remote branch to merge with fail, which has
been corrected.

* jk/proto-v2-ref-prefix-fix:
  fetch: add branch.*.merge to default ref-prefix extension
  fetch: stop checking for NULL transport->remote in do_fetch()
2022-09-15 16:09:47 -07:00
Junio C Hamano b563638d2c Merge branch 'ab/submodule-helper-leakfix'
Plugging leaks in submodule--helper.

* ab/submodule-helper-leakfix:
  submodule--helper: fix a configure_added_submodule() leak
  submodule--helper: free rest of "displaypath" in "struct update_data"
  submodule--helper: free some "displaypath" in "struct update_data"
  submodule--helper: fix a memory leak in print_status()
  submodule--helper: fix a leak in module_add()
  submodule--helper: fix obscure leak in module_add()
  submodule--helper: fix "reference" leak
  submodule--helper: fix a memory leak in get_default_remote_submodule()
  submodule--helper: fix a leak with repo_clear()
  submodule--helper: fix "sm_path" and other "module_cb_list" leaks
  submodule--helper: fix "errmsg_str" memory leak
  submodule--helper: add and use *_release() functions
  submodule--helper: don't leak {run,capture}_command() cp.dir argument
  submodule--helper: "struct pathspec" memory leak in module_update()
  submodule--helper: fix most "struct pathspec" memory leaks
  submodule--helper: fix trivial get_default_remote_submodule() leak
  submodule--helper: fix a leak in "clone_submodule"
2022-09-14 12:56:40 -07:00